Skip to content

uutils AWK progress report #16

@Alonely0

Description

@Alonely0

I will be updating this issue as we make progress. Please feel free to take down good first issues.

TODO

  • Lexer: Numeric escapings \u, \x. Good first issue.
  • Parser: Extend spans during Pratt parsing for better error messages (trivial-ish?).
  • Parser: The preprocessor is TBD (not complicated, but will tangle up pretty printing).
  • Parser: It would be nice to reduce LOC.
  • Parser: Start running gawk parsing tests at some point (especially when we get a basic interpreter and nail down --pretty-print).
  • Lexer & parser: typed regexes and indirect function calls.
  • Interpreter: We are looking forward to building a basic tree-walking interpreter to get integration testing going, as well as a baseline for future iterations. Ideally, these should be a bytecode machine or a JIT. The design sketch is for it to be a cooperative I/O machine, probably built with smol; if we want to better support AWK's long-forgotten number-crunching intent, we could easily extend this to parallel computations.
  • Lexer & parser: We must add built-in functions as standalone tokens, as well as any missing variables. Use the POSIX utility functions where needed. Good first issue.
  • Lexer: Add more tests. Good first issue.
  • Parser: Add smell tests. Good first issue.
  • Parser: Fuzzing.

Known issues

Low priority

  • Lexer: (arguably) incorrectly parses concatenated regexes like print a /x/. Note that gawk is probably the only awk implementation that parses this as such, and mawk & bwk bail on it afaik. Adding another state to the lexer context might be enough or go a long way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions