Skip to content

UTF-8 characters are not managed properly #320

@mingodad

Description

@mingodad

Testing this Javascript grammar https://github.com/pegjs/pegjs/blob/master/examples/javascript.pegjs converted to peglib I've noticed that peglib doesn't managed single char unicode properly (like shown bellow) because it stores it internally as char without any warning/error, see here

char ch_;
where it should be char32_t like here
std::vector<std::pair<char32_t, char32_t>> ranges_;

LineTerminatorSequence #"end of line"
  <- "\n"
  / "\r\n"
  / "\r"
  / "\u2028" #silently truncated
  / "\u2029" #silently truncated

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions