Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options so that a JavaScript-aware (or theoretically, Rust-aware) parser can wrap this #8

Closed
wooorm opened this issue Sep 13, 2022 · 1 comment

Comments

@wooorm
Copy link
Owner

wooorm commented Sep 13, 2022

micromark-rs can parse MDX. MDX is markdown (minus some features) plus some features.

MDX includes expressions (and expressions inside JSX), that can be parsed either:

  1. agnostic to a programming language (so this project is not aware of a programming language by default), in which case braces are counted: {xxx} is a whole expression, and so is the same substring in {xxx}yyy}. This is useful for people that want to use this project, with components, but to only support variables (props?) but not want code to be evaluated.
  2. gnostic (“aware”) of a particular programming language (most likely JavaScript through SWC, but theoretically also Rust or so), in which case something wrapping this must parse functions in, because {xxx} is valid, and so is {'a}b'} (if JS-aware), but {'a}b'} would be an exception for Rust-aware expressions.

MDX also includes ESM, which only makes sense if “gnostic to JS”. In the future we could support maybe Rust keywords for that instead of import/export if we need that.

Here’s the direction of an API I’m thinking off, as pseudo-code:

enum Signal {
    /// A syntax error.
    /// `micromark-rs` will crash with error message `String`, and convert the
    /// `usize` (offset into `&str`) to where it happened in the whole document.
    /// E.g., `Unexpected `"`, expected identifier`.
    Error((String, usize)),
    /// An “error” at the end of the (partial?) expression
    /// `micromark-rs` will either crash with error message `String` if it
    /// doesn’t have any more text, or it will try again later when more text
    /// is available
    /// E.g., `Unexpected end of file in string literal`.
    Eof(String),
    /// Done, `micromark-rs` knows that this is the end of a valid
    /// expression/esm and continues with markdown.
    Ok,
}

enum Kind {
    /// For `# {Math.PI}` and `{Math.PI}`.
    Expression,
    /// For `<a {...b}>`
    AttributeExpression,
    /// For `<a b={c}>`.
    AttributeValueExpression,
}

/// * If `kind` is `Kind::AttributeExpression`, SWC can pass an error
///   back if there is no spread, but it can also do that later when making the
///   AST).
/// * If `kind` is `Kind::AttributeValueExpression`, SWC can pass an
///   error back if the expression is nothing/whitespace-only/comments-only, but
///   it can also do that later when making the AST).
parse_expression(expression: &str, kind: Kind) -> Signal;

/// * SWC can pass errors back when there is non-ESM found (e.g.,
///   `export var a = 1\nvar b = 2`), or do it when building the AST
/// * When building the AST, SWC needs to throw errors if identifiers are used
///   in different ESM blocks (`export var a = 1\n\n# hi\n\nexport var a = 2`)
parse_esm(program: &str) -> Signal;

/// micromark-rs will then call these “hooks” when it encounters expressions/esm
/// to pass off parsing to SWC.
/// For example, taking this markdown:
/// 
/// ```
/// export function a() {
///   return `b
///
///   c`
/// }
/// ```
/// 
/// …`parse_esm` will first be called with:
/// ``"export function a() {\nreturn `b"``.
/// ⏎ SWC will then pass back:
/// `Signal::Eof("Unexpected end of file in template literal, expected closing backtick".to_string())`
/// `micromark-rs` will then continue, and call it again with:
/// ``"export function a() {\nreturn `b\n\nc`}"``.
/// ⏎ SWC will then pass back `Signal::Ok`.
/// 
/// Two big questions:
/// * If SWC is “resumable” on EOF errors, `micromark-rs` for the 2nd call
///   could pass ``"\n\nc`}"``.
///   I know that Acorn doesn’t support this though, and it might get a bit
///   complex
/// * I am not sure how to do this with Rust, but we need to find a way to
///   “save” the result of SWC partial ASTs for each expression/esm.
///   One way of thinking, is for SWC to define a sort of `Ok<T>`, and
///   `micromark-rs` saving that in an array or on events or so?
///   Another way is for `parse_expression`/`parse_esm` to be called with some
///   unique identifier/start position/incremented number, and then SWC needs
///   to store those partial ASTs somewhere?
@wooorm
Copy link
Owner Author

wooorm commented Sep 19, 2022

Done in fe618ff.

@wooorm wooorm closed this as completed Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant