Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to compile MDX string to JS function, rather than MDX file to JS module #23

Closed
lunelson opened this issue Mar 13, 2021 · 21 comments · Fixed by #30
Closed

How to compile MDX string to JS function, rather than MDX file to JS module #23

lunelson opened this issue Mar 13, 2021 · 21 comments · Fixed by #30
Labels
question Further information is requested

Comments

@lunelson
Copy link

I just found this project and I'm excited about moving past some of the issues that seem to have stalled at @mdx-js, but I'm not sure how to work with MDX source code that doesn't come from a file.

I'm working with MDX content from a CMS, so I'm doing a two-step process:

  1. compile the MDX at the data-fetching stage to a string (requires both @mdx-js/mdx and @babel/core)
  2. receive this string as props, and use an MDXRenderer component (which I also wrote myself, but the API is modelled on gatsby-plugin-mdx), which takes the code (as well as optional components prop), to execute it as a function.

Can I do this with xdm? The documentation seems aimed at .mdx files which are imported and thus compiled to modules, but I need a component function. My MDX source strings won't contain any import or export rules, but they will contain components that I'll need to pass in through a context provider or to their components prop.

@lunelson
Copy link
Author

Essentially, I think I'm asking: can I compile to a string which has no import or export rules, and just supply these in a new Function() statement, similar to this function in mdx-bundler?

@ChristianMurphy
Copy link
Collaborator

Yes.
As noted in the docs
https://github.com/wooorm/xdm#compilefile-options which returns javascript as a string
https://github.com/wooorm/xdm#evaluatefile-options returns a runnable function/component
the first parameter file can either be a virtual file or a string

@ChristianMurphy ChristianMurphy added the question Further information is requested label Mar 13, 2021
@lunelson
Copy link
Author

Thanks @ChristianMurphy but evaluate sounds like it will also run the entire compilation chain, including remark and rehype, in my front end. I'd like to compile in my server-side process, and only evaluate the compiled code on the client. The thing is I want to compile to a function body string, not a module body string. I also need to be able to inject components or provide them via context. Would this be possible?

@wooorm
Copy link
Owner

wooorm commented Mar 13, 2021

evaluate sets a _contain: true option.
If you use compile and also use that semi-hidden option, you get a function body out.

@lunelson
Copy link
Author

Boom! Thanks @wooorm that's what I was looking for!!

@wooorm
Copy link
Owner

wooorm commented Mar 13, 2021

This is a somewhat interesting use case that’s not yet support in the API, but the tools are there.

Take a look at how evaluate works: https://github.com/wooorm/xdm/blob/main/lib/evaluate.js.
You don’t need to use such a splitOptions function of course, as you’re passing stuff yourself.
Make sure to pass the runtime (as documented in the readme, for evaluate) to the generated function body when wrapping it.
You should also probably use an AsyncFunction on the client to eval your function body:

xdm/lib/run.js

Line 10 in f9c3108

export async function run(file, options) {

@wooorm
Copy link
Owner

wooorm commented Mar 13, 2021

export does work in xdm btw.
If you want imports on the client (e.g., from unpkg or whatever), that can be supported if you pass a baseUrl (see readme)

@lunelson
Copy link
Author

Cool! I'll take a deeper look in to this this week. Maybe I can contribute to your README eventually, for other users who have this use-case 😄 !

@wooorm
Copy link
Owner

wooorm commented Mar 13, 2021

Would appreciate that!

The reason that option starts with an underscore is that I do not yet know how this interface should look. The parts are there (compile and run), but I don't know how to make an intuitive api for it (yet). So I'd appreciate your feedback

@stevejcox
Copy link

@lunelson i have the exact same use case coming up this week. Did you have any luck?

@lunelson
Copy link
Author

@wooorm —CC @stevejcox— so for my Next.js use-case I ended up with the following two functions: the first runs in getStaticProps (which is removed from the front-end bundle), the second runs in the Page component on the received data.

A few questions and observations:

  • I couldn't figure out how to use the asyn run function here, so I used runSync instead -- is there a better way?
  • I wasn't sure if I was using the right runtime — is the one I'm importing here always in React? I'd like to avoid additions to the front-end bundle of course, and also I'm wondering whether the compiled string can be more minimal: the reason is that since the compiled function is being passed as data, it will end up as JSON blobs in the app rather than JS files, so it won't be minified in the production build. One could assign Fragment, jsx, jsxs, useMDXComponents, _missingComponent etc. to single-letter variable names, for example.

Thoughts?

// /lib/markdown.js

import { useMemo } from 'react';
import * as runtime from 'react/jsx-runtime.js';
import { useMDXComponents } from '@mdx-js/react';
import { runSync } from 'xdm/lib/run';
import { compile } from 'xdm';
import remarkGfm from 'remark-gfm';

export function compileMDXFunction(mdx) {
  return compile(mdx, {
    format: 'mdx',
    _contain: true,
    providerImportSource: '@mdx-js/react',
    remarkPlugins: [remarkGfm],
  }).then((buf) => buf.toString());
}

export function useMDXFunction(code) {
  return useMemo(() => {
    const { default: Component } = runSync(code, {
      ...runtime,
      useMDXComponents,
    });
    return Component;
  }, [code]);
}

FYI, the test Next.js page component:

// /pages/index.jsx

import { compileMDXFunction, useMDXFunction } from '../lib/markdown';
import { MDXProvider } from '@mdx-js/react';

export default function Page({ code }) {
  const MDXContent = useMDXFunction(code);
  return (
    <div>
      <MDXProvider
        components={{
          Foo({ children }) {
            return (
              <p>
                this is the foo component with <span>{children}</span>
              </p>
            );
          },
          wrapper(props) {
            return <div style={{ backgroundColor: 'lightblue' }} {...props} />;
          },
        }}
      >
        <h1>xdm testing</h1>
        <h2>rendered</h2>
        <MDXContent />
        <h2>function body</h2>
        <pre>
          <code>{code}</code>
        </pre>
      </MDXProvider>
    </div>
  );
}

export async function getStaticProps() {
  const code = await compileMDXFunction(
    `
h1 hello mdx

This is ~~some GFM content~~

<Foo>content</Foo>

  `
  );
  return {
    props: {
      code,
    },
  };
}

The resulting view:

image

@lunelson lunelson changed the title How to work with non-file MDX sources? How to compile MDX string to JS function, rather than MDX file to JS module Mar 18, 2021
@wooorm
Copy link
Owner

wooorm commented Mar 18, 2021

Nice!

  1. You don’t need to use the MDX provider in this case. You can pass components in directly:
// …
<div>
  <h1>xdm testing</h1>
  <h2>rendered</h2>
  <MDXContent
    components={{
      Foo({ children }) {
        return (
          <p>
            this is the foo component with <span>{children}</span>
          </p>
        );
      },
      wrapper(props) {
        return <div style={{ backgroundColor: 'lightblue' }} {...props} />;
      },
    }}
  />
  <h2>function body</h2>
  <pre>
    <code>{code}</code>
  </pre>
</div>
// …
  1. Not that important, but might help your understanding: if you really want that provider, that the value it’s set to doesn’t matter. It does still matter that it’s set tho: providerImportSource: '#', would be fine.

  2. wrapper also get components in its props, so you might want to pick that out to fix <div … components="[object Object]">

  3. I wasn't sure if I was using the right runtime

    You explicitly load import * as runtime from 'react/jsx-runtime.js';, what other runtime could it be 😅

  4. I couldn't figure out how to use the asyn run function here

    https://stackoverflow.com/questions/61751728/asynchronous-calls-with-react-usememo

  5. All together, your compile function could look like:

export async function compileMDXFunction(mdx) {
  return String(await compile(mdx, {
    _contain: true,
    providerImportSource: #',
    remarkPlugins: [remarkGfm],
  }))
}
  1. minification

    Terser seems to be able to work with estrees (https://github.com/terser/terser#estree--spidermonkey-ast), which we’re using here (through the new recma ecosystem). So it should definitely be possible to make a recma plugin that minifies using terser.

@lunelson
Copy link
Author

if you really want that provider ... '#', would be fine.

Yep, that's what I figured from looking at the output, I just put providerImportSource: true 👍

wrapper also get components in its props, so you might want to pick that out to fix <div … components="[object Object]">

Good tip, I missed that one! 😄

Terser seems to be able to work with estrees...should definitely be possible to make a recma plugin that minifies using terser.

That would be a really nice addition. I'm not familiar with how this would work but maybe I'll find time to dig in to it at some point.

Anyway thanks again, this worked really well. FWIW I found a couple of caveats with Next.js, because you have to tell it to transpile ESM dependencies specifically (I had to use the next-transpile-modules package, and include both xdm and unist-util-position-from-estree in the list), and you have to be careful that you don't end up with Node packages in your client-side bundle (at first I ended up with acorn in the bundle, until I copied the runSync function out to my own file instead of importing it).

As for the API, I think the _contain option could perhaps be called asFunctionBody (?), and that for integration with certain frameworks you might exporting hooks like the useMDXFunction one that I made...although perhaps this is a bit too opinionated at this level. If you did decide to do this, you'd have to be careful about not server-side dependencies end up in the client bundle, probably good to export from a completely separate path like xdm/client, to mitigate this possibility, though in all fairness Next.js needs to resolve their non-support of ESM dependencies at this point

@lunelson
Copy link
Author

P.S. Let me know, if you'd like me to contribute to the README about this use-case

@wooorm
Copy link
Owner

wooorm commented Mar 20, 2021

That would be a really nice addition. I'm not familiar with how this would work but maybe I'll find time to dig in to it at some point.

You can also probably use terser outside of unified/xdm. Take the string, use terser and probably configure it to support top-level return statements (if possible), and get a minified output.

Anyway thanks again, this worked really well. FWIW I found a couple of caveats with Next.js, because you have to tell it to transpile ESM dependencies specifically (I had to use the next-transpile-modules package, and include both xdm and unist-util-position-from-estree in the list)

That’s an issue that Next needs to solve. The ecosystem is moving soon (unifiedjs/unified#121 (comment)), and they don’t support it yet.

and you have to be careful that you don't end up with Node packages in your client-side bundle (at first I ended up with acorn in the bundle, until I copied the runSync function out to my own file instead of importing it).

RSC, which is far from ready but Next is also working on, solves this.
Also sounds like a Next bug. They should be able to tree shake. (reading the rest of the comment, yep, what you said with “though in all fairness Next.js needs to resolve their non-support of ESM dependencies at this point”)

As for the API, I think the _contain option could perhaps be called asFunctionBody (?)

It definitely needs a better name. I somewhat like asFunctionBody because it describes what it does. But on the other hand I’m not sure users will understand what it means.
Maybe outputFormat: 'file' | 'function-body'?

I also need to figure out how to make baseUrl work in both output formats. That’s not related to how you’re using xdm, but does relate to solving this nicely.

and that for integration with certain frameworks you might exporting hooks like the useMDXFunction one that I made...although perhaps this is a bit too opinionated at this level.

Aside: I think the function you have now is more complex that needed. You’re including 1kb of JS to get a provider, so you can do <MDXProvider components={{…}}><MDXContent /></MDXProvider> instead of the shorter <MDXContent components={{…}} />? It doesn’t make sense to me.

Also, I don’t get the useMemo, assuming you still have it. Upon some further reading, why not use useEffect such as described here: facebook/react#14326.

Other than these two thought, I think those functions can live in userland!

@lunelson
Copy link
Author

Maybe outputFormat: 'file' | 'function-body'?

How about outputFormat: 'module' | 'function' then, or outputFormat: 'module-body' | 'function-body'—since this is essentially the difference right?

You’re including 1kb of JS to get a provider

Yes I probably don't need it. I guess I was aiming for parity with existing solutions/patterns, Gatsby etc. Maybe I'll make this an option in my compiler function which defaults to false.

Also, I don’t get the useMemo, assuming you still have it.

I took this from KCD's README for mdx-bundler, he shows usage of his getMDXComponent function this way, so it seemed like a good idea. 🤷‍♂️

Upon some further reading, why not use useEffect such as described here: facebook/react#14326.

That's an interesting thought: so you mean write a hook that uses runAsync in combination with useState and useEffect? Would that allow multiple components to run compiled MDX more-or-less-concurrently with better performance then?

Other than these two thought, I think those functions can live in userland!

For sure. I'm thinking about writing a post on dev.to about this because I know this use-case is a thing for Next.js users, and there's a need for a really up-to-date solution for both file-/(module-) and string/(function-)based MDX sources.

@wooorm
Copy link
Owner

wooorm commented Mar 20, 2021

How about outputFormat: 'module' | 'function' then, or outputFormat: 'module-body' | 'function-body'—since this is essentially the difference right?

That’s a great idea, much better! Taking it further, how about outputFormat: 'program' | 'function-body'?

The word “program” is used by estree (the JS AST used by Firefox, Babel, ESLint, much more) to represent the whole. The difference between whether such a program is a module or a script, depends on the environment: .mjs or .cjs; type="module" or type="text/javascript" on <script> elements, and is added on that program node (as program.sourceType: 'module' | 'script')

I also think that program is explicit enough, -body is not needed there. On the other hand, function sounds like it includes function (args) { ... } or so, which it doesn’t, so I think I prefer that to be an explicit function-body.

Then the next thing to do would be to split baseUrl, which currently both turns import statements into a dynamic import() and also resolves them, into two things.

import -> import() is most useful in function-body, but because dynamic import() is available in scripts too, and assuming top-level await (stage 3 proposal) lands, then program could yield a a file that can work in .cjs files!
This could either be a) outputType: 'script' | 'module' or b) importStatements: false (defaulting to true)

Then baseUrl needs to work on both import statements and dynamic import().

Would that allow multiple components to run compiled MDX more-or-less-concurrently with better performance then?

I think so. It could be its own little module. You can publish it, too 😅. It gets such a “function-body” from xdm as a code parameter, then it asynchroneously runs it.
Maybe something like this: https://github.com/streamich/react-use/blob/master/src/usePromise.ts.
Async is always slower than sync, but async is sometimes better.
Still: I’m not a React developer.

For sure. I'm thinking about writing a post on dev.to about this because I know this use-case is a thing for Next.js users, and there's a need for a really up-to-date solution for both file-/(module-) and string/(function-)based MDX sources.

Nice! Yeah, maybe it’s a small hook. A couple lines. Then you don’t need to publish it, people can just copy-paste it in.

wooorm added a commit that referenced this issue Mar 20, 2021
This exposes the currently internal `_contain` option in the public interface,
which is used by `evaluate`, so that users can depend on it too.

Related to GH-23
wooorm added a commit that referenced this issue Mar 21, 2021
This exposes the currently internal `_contain` option in the public interface,
which is used by `evaluate`, so that users can depend on it too.

Related to GH-23
wooorm added a commit that referenced this issue Mar 22, 2021
This exposes the currently internal `_contain` option in the public interface,
which is used by `evaluate`, so that users can depend on it too.

Related to GH-23
Closes GH-26.

Reviewed-by: Christian Murphy <christian.murphy.42@gmail.com>
wooorm added a commit that referenced this issue Mar 22, 2021
Split previously experimental into two stable options:

*   `useDynamicImport` — compile import statements into dynamic import
    expressions
*   `baseUrl` — resolve relative import specifiers from a given URL

This additionally fixes two bugs in the `'function-body'` output format:

*   `export * from 'a'` was not supported
*   `export {a as b}` was inverted

Closes GH-23.
Related-to GH-26.
wooorm added a commit that referenced this issue Mar 24, 2021
Split previously experimental into two stable options:

*   `useDynamicImport` — compile import statements into dynamic import
    expressions
*   `baseUrl` — resolve relative import specifiers from a given URL

This additionally fixes two bugs in the `'function-body'` output format:

*   `export * from 'a'` was not supported
*   `export {a as b}` was inverted

Closes GH-23.
Related-to GH-26.
Closes GH-30.
@wooorm
Copy link
Owner

wooorm commented May 9, 2021

For minification, I landed a PR in terser to add support for accepting and yielding our AST (ESTree).

import {compile} from './index.js'
import {minify} from 'terser'

var code = `export var Thing = () => <>World!</>

# Hello, <Thing />
`

console.log(String(await compile(code)))

console.log(String(await compile(code, {recmaPlugins: [recmaMinify]})))

function recmaMinify() {
  return transform
  async function transform(tree) {
    return (
      await minify(tree, {
        parse: {spidermonkey: true},
        format: {spidermonkey: true, code: false}
      })
    ).ast
  }
}

Yields:

/*@jsxRuntime automatic @jsxImportSource react*/
import {Fragment as _Fragment, jsx as _jsx, jsxs as _jsxs} from "react/jsx-runtime";
export var Thing = () => _jsx(_Fragment, {
  children: "World!"
});
function MDXContent(props) {
  const _components = Object.assign({
    h1: "h1"
  }, props.components), {wrapper: MDXLayout} = _components;
  const _content = _jsx(_Fragment, {
    children: _jsxs(_components.h1, {
      children: ["Hello, ", _jsx(Thing, {})]
    })
  });
  return MDXLayout ? _jsx(MDXLayout, Object.assign({}, props, {
    children: _content
  })) : _content;
}
export default MDXContent;
import {Fragment as _Fragment, jsx as _jsx, jsxs as _jsxs} from "react/jsx-runtime";
export var Thing = () => {
  return _jsx(_Fragment, {
    children: "World!"
  });
};
function MDXContent(n) {
  const t = Object.assign({
    h1: "h1"
  }, n.components), {wrapper: MDXLayout} = t, s = _jsx(_Fragment, {
    children: _jsxs(t.h1, {
      children: ["Hello, ", _jsx(Thing, {})]
    })
  });
  return MDXLayout ? _jsx(MDXLayout, Object.assign({}, n, {
    children: s
  })) : s;
}
export default MDXContent;

Note that this minifies props and such. This is not a formatter. If you also want to format, it becomes a bit more complex.

A nice alternative is running esbuild after xdm, which is super fast and can do all that too

@lunelson
Copy link
Author

@wooorm thanks for this update! Interesting that you mention esbuild:I keep thinking about the best way to use this with Next.js (because of Next's poor support for ESM packages); do you think it's simpler to just use mdx-bundler in that case (it sounds like it handles the minification concern as well as others...)?

Otherwise, I was thinking of doing a package specifically for the Next.js use-case (something like "next-xdm"), which would be a Next.js plugin, exporting the webpack config but also the exports of xdm itself. I would have it built with esbuild using "node10" as a target.

@wooorm
Copy link
Owner

wooorm commented May 13, 2021

You can use esbuild both to build xdm into a CJS bundle, and to run it on the results of xdm.
mdx-bundler does the last, plus provides some other things. But doing a Next-specific thing might be nice too?

@vikie1
Copy link

vikie1 commented Feb 21, 2022

This thread is a saviour. Congrats @wooorm and @lunelson.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants