Open
Description
I am experiencing an issue where tokenizing a large 200MB file using ANTLR4 results in over 1GB of memory usage. Here’s the code I am using to process the file:
try (InputStream inputStream = new FileInputStream(file)) {
CharStream charStream = CharStreams.fromStream(inputStream);
Lexer lexer = new MySqlLexer(charStream);
UnbufferedTokenStream<Token> tokenStream = new UnbufferedTokenStream<>(lexer);
while(true) {
Token token = tokenStream.LT(1);
int tokenType = token.getType();
if (tokenType == Token.EOF) {
break;
}
// ....other code
}
} catch (Exception e) {
// Handle exception
}
I noticed that as soon as I get the tokenStream, the memory usage spikes to over 1GB. I am using ANTLR4 version 4.13.1 on macOS 15.2. The grammar file I am using is MySqlLexer.g4.
Metadata
Metadata
Assignees
Labels
No labels