Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser failed unexpectly #4778

Open
CBalaa opened this issue Feb 19, 2025 · 1 comment
Open

Parser failed unexpectly #4778

CBalaa opened this issue Feb 19, 2025 · 1 comment

Comments

@CBalaa
Copy link

CBalaa commented Feb 19, 2025

I use CPP to build my parser, but got error: Segmentation fault (core dumped). However, when I use Python to build, it works normally. My ANTLR4 tool version is 4.13.2.

When I change rule extra_decls: LeftBrace extra_decl (Comma extra_decl)* RightBrace; to extra_decls: LeftBrace extra_decl (Comma extra_decl)* RightBrace; , or change atom_suffix: Less identifier Greater; to atom_suffix: LeftBrace identifier RightBrace;, it works.

I think there maybe something wrong with antlr4 CPP runtime. I wonder why and how to fix it.

The following are my grammar and test code:

UniASTParser.g4

parser grammar UniASTParser;

options {
    tokenVocab = UniASTLexer;
}

rules
    : ruleSpec+ EOF
    ;

ruleSpec
    : parserRuleSpec
    ;

parserRuleSpec
    : ParserRuleName Colon ruleBlock extra_decls? Semi
    ;

extra_decls
    : Less extra_decl (Comma extra_decl)* Greater
    ;

extra_decl
    : identifier Assign identifier Less (identifier | StringLiteral) Greater
    ;

ruleBlock
    : ruleAltList
    ;

ruleAltList
    : actionAlt
    ;

actionAlt
    : alternative
    ;

alternative
    : element+
    ;

element
    : atomOrGroup
    ;

atomOrGroup
    : atom 
    ;

atom_suffix: Less identifier Greater;

atom
    : terminalDef atom_suffix?
    | ruleref atom_suffix?
    ;

terminalDef
    : LexerRuleName
    | StringLiteral
    ;

ruleref
    : ParserRuleName
    ;

identifier
    : LexerRuleName
    | ParserRuleName
    ;

UniASRLexer.g4

lexer grammar UniASTLexer;

fragment NONDIGIT: [a-zA-Z_];

fragment UPPERCASE: [A-Z];

fragment LOWERCASE: [a-z];

fragment ALLCASE: [a-zA-Z0-9_];

// string literal
fragment SHORT_STRING_LITERAL:
	'\'' SHORT_STRING_ITEM_FOR_SINGLE_QUOTE* '\''
	| '"' SHORT_STRING_ITEM_FOR_DOUBLE_QUOTE* '"';

fragment SHORT_STRING_ITEM_FOR_SINGLE_QUOTE:
	SHORT_STRING_CHAR_NO_SINGLE_QUOTE
	| STRING_ESCAPE_SEQ;
fragment SHORT_STRING_ITEM_FOR_DOUBLE_QUOTE:
	SHORT_STRING_CHAR_NO_DOUBLE_QUOTE
	| STRING_ESCAPE_SEQ;

fragment SHORT_STRING_CHAR_NO_SINGLE_QUOTE: ~[\\\r\n'];

fragment STRING_ESCAPE_SEQ: '\\' OS_INDEPENDENT_NL | '\\' .;

fragment SHORT_STRING_CHAR_NO_DOUBLE_QUOTE: ~[\\\r\n"];

fragment OS_INDEPENDENT_NL: '\r'? '\n';

fragment LONG_STRING_LITERAL:
	'\'\'\'' LONG_STRING_ITEM*? '\'\'\''; // | '"""' LONG_STRING_ITEM*? '"""'

fragment LONG_STRING_ITEM: LONG_STRING_CHAR | STRING_ESCAPE_SEQ;

fragment LONG_STRING_CHAR: ~'\\';

// integer literal
fragment INTEGER:
	DEC_INTEGER
	| BIN_INTEGER
	| OCT_INTEGER
	| HEX_INTEGER;
fragment DEC_INTEGER:
	NON_ZERO_DIGIT ('_'? DIGIT)*
	| '0'+ ('_'? '0')*;
fragment BIN_INTEGER: '0' ('b' | 'B') ('_'? BIN_DIGIT)+;
fragment OCT_INTEGER: '0' ('o' | 'O') ('_'? OCT_DIGIT)+;
fragment HEX_INTEGER: '0' ('x' | 'X') ('_'? HEX_DIGIT)+;
fragment NON_ZERO_DIGIT: [1-9];
fragment DIGIT: [0-9];
fragment BIN_DIGIT: '0' | '1';
fragment OCT_DIGIT: [0-7];
fragment HEX_DIGIT: DIGIT | [a-f] | [A-F];

// floatpoint literal
fragment FLOAT_NUMBER: POINT_FLOAT | EXPONENT_FLOAT;
fragment POINT_FLOAT: DIGIT_PART? FRACTION | DIGIT_PART '.';
fragment EXPONENT_FLOAT: (DIGIT_PART | POINT_FLOAT) EXPONENT;
fragment DIGIT_PART: DIGIT ('_'? DIGIT)*;
fragment FRACTION: '.' DIGIT_PART;
fragment EXPONENT: ('e' | 'E') ('+' | '-')? DIGIT_PART;

// boolean literal
fragment TRUE: 'True';
fragment FALSE: 'False';

// key words

FEGEN: 'fegen';

DEF: 'def';

INPUTS: 'inputs';

RETURNS: 'returns';

ACTIONS: 'actions';

IR: 'ir';

OPERAND_VALUE: 'operandValue';

ATTRIBUTE_VALUE: 'attributeValue';

CPP_VALUE: 'cppValue';

OPERATION: 'operation';

FUNCTION: 'function';

TYPEDEF: 'typedef';

OPDEF: 'opdef';

ARGUMENTS: 'arguments';

RESULTS: 'results';

BODY: 'body';

EMPTY: 'null';

PARAMETERS: 'parameters';

ASSEMBLY_FORMAT: 'assemblyFormat';

CLASS: 'class';

SELF: 'self';

// types
TYPE: 'Type';

BOOL: 'bool';

INT: 'int';

FLOAT: 'float';

STRING: 'string';

LIST: 'list';

MAP: 'map';

// stmt

IF: 'if';

ELIF: 'elif';

ELSE: 'else';

FOR: 'for';

IN: 'in';

WHILE: 'while';

RETURN: 'return';

VARIABLE: 'variable';

// marks

AND: 'and';

OR: 'or';

NOT: 'not';

IS: 'is';

Equal: '==';

NotEq: '!=';

Less: '<';

LessEq: '<=';

Greater: '>';

GreaterEq: '>=';

AT: '@';

DivDiv: '//';

Comma: ',';

Semi: ';';

LeftParen: '(';

RightParen: ')';

LeftBracket: '[';

RightBracket: ']';

LeftBrace: '{';

RightBrace: '}';

Dot: '.';

Colon: ':';

AlterOp: '|';

QuestionMark: '?';

Star: '*';

Div: '/';

Plus: '+';

Minus: '-';

Assign: '=';

StarStar: '**';

MOD: '%';

Arror: '->';

Tilde: '~';

Range: '..';

// literal

StringLiteral: SHORT_STRING_LITERAL | LONG_STRING_LITERAL;

BoolLiteral: TRUE | FALSE;

IntegerLiteral: INTEGER;

FloatPointLiteral: FLOAT_NUMBER;

// identifiers

LexerRuleName: UPPERCASE (NONDIGIT | DIGIT)*;

ParserRuleName: LOWERCASE (NONDIGIT | DIGIT)*;


Whitespace: [ \t]+ -> skip;

Newline: ('\r' '\n'? | '\n') -> skip;

BlockComment: '/*' .*? '*/' -> skip;

LineComment: '//' ~ [\r\n]* -> skip;

test.cpp

#include "UniASTLexer.h"
#include "UniASTParser.h"
#include "antlr4-runtime.h"
#include <iostream>

using namespace antlr4;
using namespace std;

int main(int argc, char **argv) {
  if (argc != 2) {
    cerr << "no input file";
    return 1;
  }
  std::ifstream stream;
  stream.open(argv[1]);
  ANTLRInputStream input(stream);
  UniAST::UniASTLexer lexer(&input);
  CommonTokenStream tokens(&lexer);
  UniAST::UniASTParser parser(&tokens);
  std::cout << parser.rules()->getText() << std::endl;
  return 0;
}

test file

module: assign_stmt{assign_stmt};
assign_stmt: variable_access{variable_access} Assign expression{expression};
@kaby76
Copy link
Contributor

kaby76 commented Feb 19, 2025

The input doesn't parse {assign_stmt}; of your input because you don't use LeftBrace or the string literal { anywhere in the parser grammar. Also, UniASRLexer.g4 is not the file name. It should be UniASTLexer.g4 because it's declared lexer grammar UniASTLexer;. Thus, it appears you have a build issue. I recommend that you write a build script to clean up all generated files, run the Antlr tool, and recompile everything from scratch. As you are using the Cpp target, write a Bash script with the two Antlr tool calls in the correct order, followed by the g++ or whatever compiler you are using afterwards. The segv looks like you do not have the compiler and linker flags correct, e.g., you are not compiling with pthreads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants