Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with external tools #1056

Closed
laurmaedje opened this issue May 1, 2023 · 126 comments
Closed

Integration with external tools #1056

laurmaedje opened this issue May 1, 2023 · 126 comments
Labels
proposal You think that something is a good idea or should work differently. scripting About Typst's coding capabilities

Comments

@laurmaedje
Copy link
Member

laurmaedje commented May 1, 2023

Motivation

This RFC discusses mechanism to interact with tools or data outside the Typst document and its direct file system environment to. Possible use cases:

  • Use external tooling, e.g. to generate a plot
  • Execute or test code blocks in another language, e.g. Python or Julia (Juypter-notebook-style)
  • Integrate experiment data, possibly from a remote source

Available options

Preprocessing [already possible]

The simplest approach and the only approach that is possible right now is preprocessing your Typst files externally (e.g. by scanning for raw blocks with regex), as done in typst_pyimage. This approach is rather flaky and does not integrate well with content that is created programmatically.

Preprocessing + Queries [my current favorite]

Similar to bare preprocessing but less flaky. Basically, have to option to run queries against the document through the CLI. This could, for instance, be used by an external script to extract all raw blocks and generate some files that Typst can then read. The big benefit is that Typst remains completely safe while the external tool doesn't have to parse or understand Typst. Also works if content is created programmatically.

Plugins

Define a clear plugin API to extend Typst, probably in the form of WebAssembly plugins. The plugin API could be as simple as accepting string -> string functions or expose the whole computational model. There are the two suboptions: Completely encapsulated plugins that can just compute stuff versus plugins that can do more dangerous stuff. The former option prevents a split between "safe" and "unsafe" plugins and would also allow plugins to be used in the web app (distributed through normal packages). Well-encapsulated plugins would prevent concerns about portability, security, or memoization. Only tools that can be compiled to WebAssembly could be integrated, so this would be a bit limited in what can be done. Can also be combined with another option like preprocessing + queries.

Shell Escape

Let Typst invoke commands on the local system shell. This way we could reuse tools written in arbitrary other programming languages. However, this makes Typst less portable as compilation would then heavily depend on the environment in question (e.g. through different shell syntax on Linux vs Windows, not to speak of the Web App). Furthermore, this is a security hazard as it can have arbitrary unwanted side effects on the system, which is a security issue. Caching and memoization would also become much harder as we wouldn't know when to re-run the shell command. Would solve some problems, but introduce a whole bunch of new ones.

HTTP Escape

Provide a way to make a GET HTTP request to a local or remote service. The service may call to the shell internally, but shall expose a clear OS-independent interface. The service can itself decide when to re-run the command and communicate this to Typst through cache control headers. Aside from integration with external tools, this also allows access to data sources, for example with experiment data. For reproducibility, we could introduce some mechanism for storing the request results locally.

This feature would need to be explicitly enabled/configured, on a per-package basis. Similar to shell escape, this allows us to reuse tools written in arbitrary other programming languages, through a well-defined interface that is not OS dependent. It would also be more feasible to integrate into the web app than direct shell escape. In contrast to shell escape, caching and memoization could be handled through standard HTTP cache control. Of course, the service itself can be implemented in a problematic way (i.e. a service that simply runs an arbitrary shell process based on the request, with no regard for security or cache control). However, the user then would need to start that bad service in addition to Typst and toggle the CLI flag. For things that can be done with WebAssembly, plugins would of course be better. This system would increase Typst's flexibility to things that can not be easily encapsulated into a WebAssembly plugin.

Final Remarks

Please consider this as a request for comments in the truest sense of the word. I am not certain whether we want something like this and if so how we want to implement, but I'd like to see meaningful discussion on how we can cater to these more complex use cases.

@laurmaedje laurmaedje added the rfc label May 1, 2023
@Luis-Licea
Copy link
Contributor

What about a method for fetching external data? The method could have the following signature:

get(url: string, always: bool) -> path

Typst fetches the data and returns a path to the resource. Then it is up to the user to load the file appropriately:

let path = get("https://arbitrary_data.com?pandas=yes%20please", always: true)

let image_content = image(path, width: 50%)

let csv_data = csv(path)
let json_data = json(path)
let text_data = read(path)
let toml_data = toml(path)
let xml_data = xml(path)
let yaml_data = yaml(path)

Fetching resources is constrained to happening once before the document compilation begins. The compilation process looks like so:

  • If always is true, fetch the data and overwrite the previously requested files. Store the files in memory or a temporary folder like /tmp/typst/<sanitized_url>/file.
  • If always is false, only fetch the data if the file does not exist. Store files permanently in a folder like fetched/<sanitized_url>/file.
  • Continue remaining document processing as normal.

@rpitasky
Copy link
Contributor

rpitasky commented May 1, 2023

Selected paraphrased messages of mine from the extensive Discord discord relating to the topic:

  • Journals and conferences that require typst source would have to require that all the reviewers and editors would have to trust every document and every system dependency of the document before even reading it
  • You don’t see people writing preprocessors for typst’s alternatives, because typst’s alternatives currently fill the gaps where people would want a preprocessor (eg. plotting)
  • The correct option for embedding code results is now (and, I hope, will forever be) having your code (which is run by an external interpreter) export some kind of typst-readable format (csv/json) and embedding those results using typst. This is not an ad-hoc solution, rather, it allows the user the most flexibility while retaining security and usability and also being fairly obvious/straightforward and requiring negligible maintenance from the typst team other than supporting readers for common data formats.
  • typst_pyimage, which implements a python preprocessor to generate plots and other images, best reflects not a user want to run python code but rather a user want to have good plots. Running python code is just the best avenue to this result right now. I strongly doubt that there are many (as in, more than three) other possible reasons to require a preprocessor that do not reflect clearly necessary improvements to typst that can be better done within typst than without.
  • Users are lazy, as programmers and security-conscious individuals we are not representative users, and we cannot rely on mere compile-time warnings or CLI flags to protect people from potentially devastating issues.

@Luis-Licea
Copy link
Contributor

Luis-Licea commented May 1, 2023

As for executing arbitrary code, the only wieldy approach is requiring external tools to return plain text or store data in documents whose paths are accessible to Typst. Typst could read the plain text directly, or if processing is required, Typist could be able to open images and files in serialization formats like yaml, toml, xml, etc.

For executing external scripts in arbitrary programming languages, I would suggest a function like this:

run(code: string, runner: string) -> string

An usage example would be the following:

let sum = run("print(1 + 9)", runner: "/usr/bin/python3")

let image_path =  run(
   "
   image = some.library.create_image()
   path = '/tmp/example'
   image.save_at(path)
   print(path)
   ", 
   runner: "/usr/bin/python3"
)

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

As stated in the Discord discussion, I believe it is much better we add a custom plugin system to Typst (or other alternative to run arbitrary code), which would let us sandbox those plugins the way we want and have a "recommended way" to do this sort of thing, instead of forcing the user who wishes to do that to use alternatives such as typst_pyimage which wouldn't necessarily be sandboxed and/or audited by us.

Sure, it is much better to increase Typst's capabilities (e.g. to provide plot natively). But it's impossible that typst will cover every single use case, and the compiler will always have more power than the language itself. Users will always want to customize things every way they can, and if there's a way to do so without having to use LittleJoe01's custom untrusted compiler which allows for plugins, but rather using Typst's built-in way with permission control, I think this would be much safer.

Always keep this in mind: If the user really wants to do something, they will find a way around it. Of course, this doesn't necessarily mean that we need to adapt ourselves for everything - but since Typst is a language which has programming capabilities, users will eventually need to use plugins for whichever complex processing they're doing in their typst programs. E.g. they might want to use pandas, they might want to use advanced linear algebra functions (which wouldn't be efficient if implemented in Typst), and other things like that.

Being forced to use a preprocessor is hacky and doesn't allow for complex interaction with the AST / elements and input variables in Typst. It's annoying for the same reason that proc macros aren't perfect in Rust - they aren't necessarily aware of the runtime types, and as such can end up having the wrong assumptions regarding types based solely on their names (which is what a Typst preprocessor would have to do - assume things based on how they're written, but the actual runtime values of things would be unknown).

@Dherse
Copy link
Sponsor Collaborator

Dherse commented May 1, 2023

Typst fetches the data and returns a path to the resource. Then it is up to the user to load the file appropriately:

let path = get("https://arbitrary_data.com?pandas=yes%20please", always: true)

let image_content = image(path, width: 50%)

Some questions regarding the whole get API:

  • Can you provide some examples where this is needed?
  • Why not download the resources yourself and save them in your project's folder structure?
  • What does it bring that cannot be achieved in pure typst?
  • How do you make this secure?
  • Do you simply trust code implicitly?
  • Do you only allow "trusted packages" do use this?
  • How do you deal with permissions of dependencies?
  • And more precisely, when you set dependency permission, is it "infectious", i.e propagating to all of its subdependencies?

Image the case where we have a malicious package, call it package_a that depends on a very common package, say tablex. As it happens, tablex uses this feature to automatically render tables from scraped data, amazing you say! Now, in your project you use tablex so you give it permissions(network: true) 🎉. Now package_a depends on tablex and uses it to circumvent the fact you didn't give it permissions, therefore my question is: are permissions transitive? How do you deal with a malicious package creating elements that do have permission to do web queries?

let sum = run("print(1 + 9)", runner: "/usr/bin/python3")
  • Okay, let's say I am working on this document with you, I run windows or some other distribution that doesn't use /usr/bin/python3 as the default binary for python, how do you solve this?
  • How is the WebApp supposed to be running this code? And all of my previous questions regarding permissions still stand for this as well, except that now it's even worse from a security point of view. How is this an acceptable solution?

Opening the door to these kind of issues and hand-waving the problem with "we'll just use permission" or "just use a CLI flag" are not solutions. I have yet to see, either here or on the long Discord thread any solution for these problem. Truth be told, I have yet to see an example of somewhere where you even need these features.

Regarding my opinion:

  • If you need to run external code to get results for your documents, you should do it manually (or using a bash script/make/whatever) beforehand and simply import it in your document
  • If you have very specific need, such as automatic assembling of documents for automated documentation generation, you should probably do it with an external tool, since you'll probably need settings, a database of documents, etc. And this goes far beyond what you can do in a typesetting software.
  • As @rpitasky mentioned, people are lazy (me included), giving a lazy way of circumventing any and all safety advantages of typst is a recipe for disaster. Can't wait to see the first CVEs for typst.
  • There are features missing in typst, namely plotting, but these can be added to typst, and people can use matplotlib/pandas/etc separately and import them as images (SVGs support text now 🎉) until they are ready.
  • If you are using typst in a Jupyter-like fashion, which is imo a very good use case, you should wait and work on HTML export and write a Jupyter plugin once it's ready that allows replacing markdown with typst.
  • Typst must not scope creep to cover the use cases of every piece of software under the sun, covering the use case of Jupyter is, imo, out of scope. But as previously mentioned, I'd like to see nice support for typst in Jupyter, not the other way around.
  • HTTP requests would not be a good idea for the WebApp as it breaks [https://developer.mozilla.org/en-US/docs/Web/HTTP/Cross-Origin_Resource_Policy](cross-origin policies) which, imo, should be enabled everywhere.
  • HTTP requests require the user to setup a web server, I can already see someone making a "shell as a service" and people just using this, which, again, circumvents any security advantages it might have.
  • A proper API for HTTP requests is big, like really big, if you can control headers, handle redirection, etc.
  • I still haven't seen a good use case of shell-escape.
  • I still haven't seen a good use case of http-escape.
  • It breaks memoization, saying that your rely on HTTP caching is great, but what if there is no HTTP caching? Does typst take latency * processing_time * number_of_requests time more for each incremental compilation?
  • "Temporary solutions often become permanent problems"
  • A plugin system however, keeps some degree of safety, using WASM:
    • It would be encapsulated and therefore safe, it is contained within your document
    • Be far more powerful by having access to internal APIs
    • Likely be faster (at least than slow HTTP requests)
    • Run on every platform under the sun, including the webapp with no compatibility issues, if it compiles to WASM, it runs anywhere (feel like I've heard that before ☕)
    • Could handle memoization explicitly and/or implicitly
    • However, it would take a ton more work, that being said, I believe that's good work

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

WASM plugins are the ideal implementation of extensions in my opinion. I'm also a bit unsure regarding HTTP, but plugins would theoretically make that possible either way.

are permissions transitive?

Logically, yes. Wouldn't make sense for them not to be, precisely because of what you describe, in the sense that, for plugin A to use plugin B, plugin A will have to ask for, at least, the permissions of plugin B (and that must be made explicit) to function.

@kg583
Copy link
Contributor

kg583 commented May 1, 2023

If I might be so frank, this is a terrible idea, and I would elucidate as much with two main points.

Firstly, as @rpitasky pointed out, one of LaTeX's biggest use cases is providing a raw source to journals and conferences that can then be easily re-formatted into the correct style, incorporated into a larger document, or post-processed for citation and reference analysis. It is critical to the safety and efficiency of journal editors and conference organizers that the source be entirely self-contained.

Now, for the purposes of a journal or conference submission, one could simply omit use of external tools, but the issue here lies in that establishing this system would motivate using GET requests or external program calls for resources at compile-time. A journal specifying that documents cannot access external resources would seem like a primitive restriction rather than a practical measure.

Secondly, external program calls are entirely unnecessary for what Typst does and needs to do, and only introduce unnecessary risk. Compare this proposal to Jupyter notebooks, a common analogy thus far. Jupyter notebooks allow one to embed Python blocks into text and run them asynchronously to generate data/images/whatever else. On the face of it, Typst having this capability is a no-brainer, as evidenced by tools like typst_pyimage.

But, Jupyter has a clear difference from Typst: Jupyter is a Python interpreter. This means that, for example, Google Colab (an online notebook service) doesn't have to worry too much about what a notebook might do; simply provide a container, put in your standard anti-RCE measures, futz with the builtins a bit, and leave it be.

Meanwhile, Typst is given the power to call... absolutely anything on the user's machine. Sure, you could restrict access in all kinds of obvious ways, but that doesn't prevent a document from coming zip'd with a malicious Bash script that a lazy user (or, say, a journal editor) would easily overlook. And this does not even include what you could do with arbitrary Internet access atop it.

The correct procedure for providing the tooling capabilities desired here is to simply require data to be imported from common formats. Typst should handle absolutely 0% of the execution of tools that are not already part of itself and vetted to be safe. Although LaTeX handles this suboptimally in many cases (see any and every thread on StackExchange about importing SVGs), it handles it with the correct intent. Need a better plot? Export to an image. Need a big table? Export to JSON or TOML.

If Typst needs to improve what formats it can take in to better make use of existing tooling, we can, nay should, do that. A plugin system might be a fair solution to this since we can restrict calls to a vetted API, though at some point it becomes a question as to what benefit it offers over a combination of regular packages and entirely external tools. All in all, though, security flaws are far too likely for the currently proposal to be acceptable, and attention should instead be directed toward improving Typst's native tools and import capabilities.

@Dherse
Copy link
Sponsor Collaborator

Dherse commented May 1, 2023

If Typst needs to improve what formats it can take in to better make use of existing tooling, we can, nay should, do that.

I think that in some cases, where you are working with proprietary tooling, it may not be desirable to have this "built-in", this is also an area where the plugin API can fill the gaps. We probably don't want typst to have an overly bloated API as it will make typst unmaintainable, big and slow. But with plugins, we can encapsulate some functionality, let's say plotting, and only include it when it's needed. This keeps the core of typst lean and fast while allowing multiple people to work on different APIs.

One of my original suggestion for the plugin API is something that the factorio game does: everything is a mod/plugin. That means that by default, you ship the minimal amount of features as part of the main binary and pre-package all advanced features as part of a set of included plugins, allowing the user to easily get started. If we can achieve this, then the plugin API is truly sufficient.

Additionally, this fits with the rust ethos that keeps the standard library simple and lean, leaving more advanced features to officially maintained crates (there are quite a few) and community maintained crates. Typst could use this model to supercharge it and its development.

A plugin system might be a fair solution to this since we can restrict calls to a vetted API, though at some point it becomes a question as to what benefit it offers over a combination of regular packages and entirely external tools.

This is a very good point imo. I would say that the biggest advantage of a WASM plugin system is the ability to interface directly with primitives:

  • You could create custom types and methods (although this might be in the language at some point, so I don't know)
  • You can access Layout primitives
  • You can implement any "capability" on an element (say Layout on TableElem)

However, as far as the rest is concerned, I agree with you as per my previous post.

@Dherse
Copy link
Sponsor Collaborator

Dherse commented May 1, 2023

WASM plugins are the ideal implementation of extensions in my opinion. I'm also a bit unsure regarding HTTP, but plugins would theoretically make that possible either way.

are permissions transitive?

Logically, yes. Wouldn't make sense for them not to be, precisely because of what you describe, in the sense that, for plugin A to use plugin B, plugin A will have to ask for, at least, the permissions of plugin B (and that must be made explicit) to function.

One of the big advantages with the plugins, is that you don't really need permission except maybe file system access. But as previously mentioned, you'd want something that restricts access to the current "project" directory either way.

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

Yeah, although, if we really want the system to be as flexible as possible, then plugins will have to be able to ask to the user for broader permissions (including network). In fact, to be concise, I think we should use something similar to the Flatpak model: by default, the app just specifies what it needs to run (e.g. filesystem access, environment variables, ...), and the user chooses what it can and can't use in the end (and can extend permissions or restrict them - if the app/plugin/... is broken, ok, the user chose that). See the Flatseal app for some inspiration here.

Of course, that means that some plugins won't work in the web app - e.g. plugins that require executing certain specific binaries in the PATH. But that's usually ok, as you'd simply avoid those plugins then (or deny those specific permissions). The main point is that you'll be able to know, beforehand, everything that a plugin needs to use from your system.

And, of course, using WASM plugins is better to be able to use the compiler's API (and to make sandboxing viable).

@Dherse
Copy link
Sponsor Collaborator

Dherse commented May 1, 2023

My problem with any "permission" system, outside of filesystem access, is that it introduces a break in the ecosystem, the Web app shouldn't be able to run random network requests, even if the user opts-in. Therefore, it makes it such that you cannot create the same document on the web app as on a desktop.

e.g. plugins that require executing certain specific binaries in the PATH

That's shell-escape in a nutshell, it allows bypassing the encapsulation of WASM, and quit to doing that, might as well just allow shell-escape it at least avoids the hassle of doing a plugin API.

As soon as you introduce a permission system, it causes provenance issues, if plugin A creates an element that comes from plugin B, then that element must either include the permissions of plugin B or "locally" override its permissions which is going to be a whole mess and make plugin development difficult. Therefore, no network access, no shell access it best imo.

I would like to see a WASM API that only has access to the file system, through a permission, and nothing else. Or even better: no access to the file system outside of the project folder and therefore no permission.

@rpitasky
Copy link
Contributor

rpitasky commented May 1, 2023

Why is a plugin system even necessary? There are significant downsides to any extension system, including but not limited to user fragmentation, stability, needing to maintain a plugin-facing API, and complexity. I emphatically believe such a system is worse, for many of the same reasons @kg583 mentioned. Words below are his but emphasis is mine; I strongly believe there is no benefit.

A plugin system might be a fair solution to this since we can restrict calls to a vetted API, though at some point it becomes a question as to what benefit it offers over a combination of regular packages and entirely external tools. All in all, though, security flaws are far too likely for the currently proposal to be acceptable, and attention should instead be directed toward improving Typst's native tools and import capabilities.


From @PgBiel

Always keep this in mind: If the user really wants to do something, they will find a way around it. Of course, this doesn't necessarily mean that we need to adapt ourselves for everything - but since Typst is a language which has programming capabilities, users will eventually need to use plugins for whichever complex processing they're doing in their typst programs. E.g. they might want to use pandas, they might want to use advanced linear algebra functions (which wouldn't be efficient if implemented in Typst), and other things like that.

I firmly believe that using an external tool to generate typst-readable files is not a workaround that users have to find, again for the reasons that kg mentioned.

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

Why is a plugin system even necessary?

Just think about the hundreds of feature requests that Typst gets. There is no way we will be able to implement everything in the standard library. Sure, you could use typst packages - but there are many things which Typst can't do without the help of Rust code. I mentioned a few examples, like pandas or linear algebra, but there are certainly hundreds of other use-cases.

I firmly believe that using an external tool to generate typst-readable files is not a workaround that users have to find, again for the reasons that kg mentioned.

It is a workaround, because now they have to move everything to typst-readable files, in order to be able to use their data generated in typst with their workflow. All data that they used to manage in their typst file will have to be managed in other formats, and the typst file will be just a ton of calls to read, json, ..., and this isn't even considering how data in Typst can change based on additions to the document. At this point, they will reconsider even doing anything in Typst. After all, Typst is supposed to allow markup but more, including complicated calculations. A mathematician might very well wish to generate things for their lecture on matrices and vectors using a proper linear algebra library, for example. I'm mentioning very basic examples, but truth is that there are TONS of specific cases that Typst itself won't handle. And that's where plugins conveniently come in: using a safe and controlled sandbox system, they will be able to implement new elements which the users can use and bring custom Rust functionality. After all, we can't expect Typst to cover every single possible case under the sun. And, when the user really needs such functionality (e.g. for their thesis), they will just monkey-patch the compiler for it, which is undesirable.

@rpitasky
Copy link
Contributor

rpitasky commented May 1, 2023

Just think about the hundreds of feature requests that Typst gets. There is no way we will be able to implement everything in the standard library. Sure, you could use typst packages - but there are many things which Typst can't do without the help of Rust code.

This is nothing short of a ridiculous philosophy for a piece of open-source software to have. If you have the technical ability and inclination to learn the proposed typst plugin api and implement a rust plugin, you have the technical ability and motivation to learn the internal typst API and make a pull request. No other piece of software I have ever heard of does anything like this. This might make sense for closed-source software but regardless it additionally has the same fundamental security and trust problems as executing arbitrary code in any language.

I mentioned a few examples, like pandas or linear algebra, but there are certainly hundreds of other use-cases.

Typst is not for general computation! If you’re working with the sorts of data that pandas implies, you should be outputting your results in some CSV and using typst scripting to format your results. KG’s response drives this home very well, so I’m not going to repeat it here.

As for your example of a lecturer, I believe this linear algebra beyond basic support for vectors/matrices absolutely is the responsibility of a typst library. If you need something heavy-duty, just save the results to a text file. The alternative of allowing plugins to run anything is, in KG’s words, frankly a terrible idea.

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

If you have the technical ability and inclination to learn the proposed typst plugin api and implement a rust plugin, you have the technical ability and motivation to learn the internal typst API and make a pull request.

Oh, please don't get me wrong - you can make a pull request! Problem: now Laurenz is responsible for the code you added. Laurenz will have to maintain your code. Sure, you can make more PRs later - but nothing stops you from just never coming back. So, if you implement something catering to a very specific use-case, it will likely remain outdated until the end of times. In the end, either this repository will be flooded with PRs like "Updates specific use-case X to consider Y", giving more work for the Typst maintainers, or there will be a bunch of issues from people saying things like "Hello, I tried to use the Russian-converter library module, but it doesn't seem to support the language from the Kirov russian oblast, please help!". See the problem? In the end, it's better that specific things are handled and maintained by people who know how to maintain that thing. And the best part? If the plugin goes unmaintained, others will be able to fork, or create alternatives. Plugins are what enable the community to participate more deeply, more easily!

No other piece of software I have ever heard of does anything like this.

You haven't? Here are a few examples: Programming languages in general; the Linux kernel; OBS; LuaTeX; ...

This might make sense for closed-source software but regardless it additionally has the same fundamental security and trust problems as executing arbitrary code in any language.

Of course. The burden then goes on the user to give the appropriate permissions to the plugin, and also to download them from reputable authors.

Typst is not for general computation! If you’re working with the sorts of data that pandas implies, you should be outputting your results in some CSV and using typst scripting to format your results. KG’s response drives this home very well, so I’m not going to repeat it here.

But you can use it for that! And maybe what you want to do isn't too complex - maybe you just want to multiply a few matrices here and there... and using a native library would make compilation much, much faster.

As for your example of a lecturer, I believe this linear algebra beyond basic support for vectors/matrices absolutely is the responsibility of a typst library. If you need something heavy-duty, just save the results to a text file.

Problem: Your matrix uses some data that you define in your typst file. Now you have to move all your data to a file, and all your logic to a Python script. Why do that when you can do everything in typst?

The alternative of allowing plugins to run anything is, in KG’s words, frankly a terrible idea.

Run anything that you explicitly allow to be run. Important to mention!

@kg583
Copy link
Contributor

kg583 commented May 1, 2023

I firmly believe that using an external tool to generate typst-readable files is not a workaround that users have to find, again for the reasons that kg mentioned.

It is a workaround, because now they have to move everything to typst-readable files, in order to be able to use their data generated in typst with their workflow. All data that they used to manage in their typst file will have to be managed in other formats, and the typst file will be just a ton of calls to read, json, ..., and this isn't even considering how data in Typst can change based on additions to the document. At this point, they will reconsider even doing anything in Typst.

Moving everything to typst-readable files is really not a big ask if we do our due diligence. Take your favorite programming language and look at what formats it handles in its standard library. Whatever the answer is, you now know a) these formats are (approximately) easy to parse and b) that enough people use them to warrant inclusion in the standard library. I could tell you the top 5 right now that cover the vast majority of text interchanged between software: JSON, TOML, YAML, CSV, and XML. Tack on DSLs like GraphML and your top 5 image formats and you're golden.

A mathematician might very well wish to generate things for their lecture on matrices and vectors using a proper linear algebra library, for example.

As a mathematician, I use SageMath very often for visualization, and SageMath lets me do two wonderful things: export plots to PNG and generate a LaTeX representation of an object. Making this just as utilitarian for typst as it is for LaTeX is straightforward. Since PNG support is already squared away, one would just need a method to generate a typst representation; this is something that could eventually be merged into SageMath once typst is robust and popular enough, but until that point one could write a package in SageMath that does the task.

from typst import Typst
from sage.graphs.digraph import DiGraph

G = DiGraph()
G.add_edge(1, 2); G.add_edge(2, 3); G.add_edge(3, 1)
T = Typst(G)
T.repr()

The above example is very minimal, as such a Typst object could presumably do plenty of other useful things. And indeed, a SageMath package would be entirely more capable of generating a representation than a typst plugin because SageMath is a proper programming language (literally Python with some bells and whistles) and the plugin API would have to be rather bloated to match. The only potential utility is rendering an object directly, but this feels like far too little gain.

@Dherse
Copy link
Sponsor Collaborator

Dherse commented May 1, 2023

The only potential utility is rendering an object directly, but this feels like far too little gain.

How is that far too little gain when it's the core of typst as a tool?

Additionally, it could allow support of non-trivial formats or proprietary formats, which there are tons. I've used this example before, but a QR code generator makes perfect sense to have as a plugin, it does something that you cannot easily do in typst: implement a full QR code system (there are rust crates for that), it is self contained, it can be very effectively be memoized, and it makes sense to be part of your document. Same with plotting.

Note: again, I think that plugins should be completely self contained and not access anything outside of your typst project.

@kg583
Copy link
Contributor

kg583 commented May 1, 2023

No other piece of software I have ever heard of does anything like this.

You haven't? Here are a few examples: Programming languages in general; the Linux kernel; OBS; LuaTeX; ...

You are forgetting the scope of this project in an attempt to be snarky. Typst is designed to render documents. Half of the examples above are designed to execute arbitrary code. We don't need to be doing that and I have no idea why this is a contentious take.

Of course. The burden then goes on the user to give the appropriate permissions to the plugin, and also to download them from reputable authors.

Yes! This is an absolutely huge and dangerous ask for the average user! Every other security exploit in existence is the consequence of somebody running something they shouldn't have or have been able to in the first place.

Typst is not for general computation! If you’re working with the sorts of data that pandas implies, you should be outputting your results in some CSV and using typst scripting to format your results. KG’s response drives this home very well, so I’m not going to repeat it here.

But you can use it for that! And maybe what you want to do isn't too complex - maybe you just want to multiply a few matrices here and there... and using a native library would make compilation much, much faster.

Yes, you can, but that doesn't mean we need to be able to do everything. Scope is again an important thing to keep in mind, as I'd rather do some things well than everything half-decently.

As for your example of a lecturer, I believe this linear algebra beyond basic support for vectors/matrices absolutely is the responsibility of a typst library. If you need something heavy-duty, just save the results to a text file.

Problem: Your matrix uses some data that you define in your typst file. Now you have to move all your data to a file, and all your logic to a Python script. Why do that when you can do everything in typst?

For the same reason I don't generate all my diagrams in TikZ even though I can; it's simply easier and more expressive to draw them with other software.

The alternative of allowing plugins to run anything is, in KG’s words, frankly a terrible idea.

Run anything that you explicitly allow to be run. Important to mention!

That the author of the document explicitly allows to be run. Again, take the view of a journal editor: you've just received a manuscript that claims to generate its figures using an outside repository of raw data. There's a script contained in the submission that typst runs (via, per the quote I'm replying to, a runner rather than a plugin API) to make the figures... except it doesn't do that, at all, and instead crawls the editor's PC to view unpublished manuscripts and communications and whisk them off to the author.

Alright, so what's the solution? Make execution a per-user configuration? Fine, now the editor won't get figures at all, and so would have to request that all sources be self-contained, entirely defeating the point of the integration. These worries should not ever have to cross anyone's mind when compiling a typst document, and I reiterate how stunned I am that this is even a question.

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

You are forgetting the scope of this project in an attempt to be snarky. Typst is designed to render documents. Half of the examples above are designed to execute arbitrary code. We don't need to be doing that and I have no idea why this is a contentious take.

Sorry if I didn't write this the right way, I didn't mean to pass off as snarky. What I was replying to was their assertion (or, at least, the one that I interpreted from their message) that they haven't seen other projects which enable the use of plugins to extend what is possible. Regarding being a markup lang, LuaTeX was included to ensure I wasn't deviating too much from the point here.

Yes! This is an absolutely huge and dangerous ask for the average user! Every other security exploit in existence is the consequence of somebody running something they shouldn't have or have been able to in the first place.

Sure, but then you wouldn't download anything in your computer. If the user is willing to do something, and is aware of the consequences, then let them!

Yes, you can, but that doesn't mean we need to be able to do everything. Scope is again an important thing to keep in mind, as I'd rather do some things well than everything half-decently.

We don't need to, and that's precisely why we shouldn't bloat the standard library with tons of use-cases, instead delegating them to compiler extensions.

For the same reason I don't generate all my diagrams in TikZ even though I can; it's simply easier and more expressive to draw them with other software.

Of course, you're free to do that. But many people prefer to generate it in the document.

That the author of the document explicitly allows to be run. Again, take the view of a journal editor: you've just received a manuscript that claims to generate its figures using an outside repository of raw data. There's a script contained in the submission that typst runs (via, per the quote I'm replying to, a runner rather than a plugin API) to make the figures... except it doesn't do that, at all, and instead crawls the editor's PC to view unpublished manuscripts and communications and whisk them off to the author.

Alright, so what's the solution? Make execution a per-user configuration? Fine, now the editor won't get figures at all, and so would have to request that all sources be self-contained, entirely defeating the point of the integration. These worries should not ever have to cross anyone's mind when compiling a typst document, and I reiterate how stunned I am that this is even a question.

Usually, journals will set rules and/or style guides to avoid this kind of thing. E.g.: we don't allow this or that extension (if any at all). Papers will be rejected if you don't provide your figures as svg. Etc. That's just how things work in real life. The journal editor won't run malicious code by accident because either Typst will warn them that the plugin is missing, or ask them to give permissions to plugin X or Y (then it will be up to them to allow said plugins to do anything to their computer).

@rpitasky
Copy link
Contributor

rpitasky commented May 1, 2023

I think it is much, much more likely that a journal would go “we don’t allow typst plugins at all” or (perhaps even more likely) “we don’t allow typst at all” than single out individual plugins in a constantly updating ecosystem.

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

I think it is much, much more likely that a journal would go “we don’t allow typst plugins at all” or (perhaps even more likely) “we don’t allow typst at all” than single out individual plugins in a constantly updating ecosystem.

And that's a perfectly valid ask. People writing to that journal would then be aware of the rules and avoid using them (or use LaTeX instead, in the latter case), assuming the journal uses proper communication.

@kg583
Copy link
Contributor

kg583 commented May 1, 2023

Yes! This is an absolutely huge and dangerous ask for the average user! Every other security exploit in existence is the consequence of somebody running something they shouldn't have or have been able to in the first place.

Sure, but then you wouldn't download anything in your computer. If the user is willing to do something, and is aware of the consequences, then let them!

You are far more optimistic about the average user's competency and attention to consequences than I, but that isn't even the main concern. The fact of the matter is that I simply do not worry about what gets run when I compile a TeX document. If typst cannot offer the same security, I'm not touching it anymore.

That the author of the document explicitly allows to be run. Again, take the view of a journal editor: you've just received a manuscript that claims to generate its figures using an outside repository of raw data. There's a script contained in the submission that typst runs (via, per the quote I'm replying to, a runner rather than a plugin API) to make the figures... except it doesn't do that, at all, and instead crawls the editor's PC to view unpublished manuscripts and communications and whisk them off to the author.

Alright, so what's the solution? Make execution a per-user configuration? Fine, now the editor won't get figures at all, and so would have to request that all sources be self-contained, entirely defeating the point of the integration. These worries should not ever have to cross anyone's mind when compiling a typst document, and I reiterate how stunned I am that this is even a question.

Usually, journals will set rules and/or style guides to avoid this kind of thing. E.g.: we don't allow this or that extension (if any at all). Papers will be rejected if you don't provide your figures as svg. Etc. That's just how things work in real life. The journal editor won't run malicious code by accident because either Typst will warn them that the plugin is missing, or ask them to give permissions to plugin X or Y (then it will be up to them to allow said plugins to do anything to their computer).

I'm well aware of how things work in real life; I've got multiple publications, which is exactly why I voiced my concerns as I did. Journals would rather block everything than some things, at which point we've lost all utility.

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

You are far more optimistic about the average user's competency and attention to consequences than I, but that isn't even the main concern.

Hence why Typst should make very explicit warnings about permission control.

The fact of the matter is that I simply do not worry about what gets run when I compile a TeX document.

I don't get it - TeX has shell-escape (which effectively allows arbitrary plugins), so how can you be so sure of that?
I'm assuming you don't enable shell-escape - and Typst would give you that exact same permission control, but with much more granularity. Journals will be able to completely control the sandbox if they want to, compared to just enabling or disabling shell-escape. I think we can agree that this is better than TeX's system, right? And therefore, if journals are using TeX, which has a much worse permission control system than the one proposed here, then they can use Typst as well.

If typst cannot offer the same security, I'm not touching it anymore.

As stated above, it'd offer even better security.

I'm well aware of how things work in real life; I've got multiple publications, which is exactly why I voiced my concerns as I did. Journals would rather block everything than some things, at which point we've lost all utility.

I mean, they probably don't allow TeX shell-escape already (instead of giving up on TeX), so I can't see how they would be disappointed having much better permission control at their disposal.

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

(Not to mention that journals would be able to provide their own plugins for their own customization, so this system can be even more of a plus.)

@Dherse
Copy link
Sponsor Collaborator

Dherse commented May 1, 2023

If typst cannot offer the same security, I'm not touching it anymore.

That comes off as a bit abrasive, I know that we all want the best for typst and we all have strong opinions, but we need to converse in a friendly manner as to avoid partisanship. Additionally, this just sounds like those people going "Well, I'll just fork it".

As stated above, it'd offer even better security.

To me, that statement can only really be true with self-contained plugins that have no system access outside of the project itself.

Journals will be able to completely control the sandbox if they want to

The only use for LaTeX shell-escape I ever encountered was syntax highlighting with some library (not listings, but another one whose name I couldn't be bothered to remember). That library uses Python to do syntax highlighting, could it be used in a published paper? (note that I have never (yet) published a paper so I am a bit less experienced on that topic).

Because if it can, then publishers already allow a pretty big security hole imo.

@Enivex
Copy link
Collaborator

Enivex commented May 1, 2023

Anything requiring shell-escape in LaTeX would never be accepted by any reputable journal.

@johannes-wolf
Copy link
Contributor

johannes-wolf commented May 1, 2023

What about an option to the typst compiler to export certain elements, e.g. raw blocks?
A user wanting to process code from the typst document could invoke the typst compiler, collect all code blocks of interest and do what ever he wants with them (generate some diagrams etc.). Those elements (raw blocks) would have a unique assignable name that can then be used to reimport the generated data.

That would generate no security problems but exporting files insides the current root. What is done with those files is up to the user or up to his/her build environment.

TLDR: Feature to export raw blocks (or maybe a special block) from typst to files or pipes, the user can then process them. E.g. some support for processing data from typst without having to write some kind of preprocessor.

@Dherse
Copy link
Sponsor Collaborator

Dherse commented May 1, 2023

TLDR: Feature to export raw blocks (or maybe a special block) from typst to files or pipes, the user can then process them.

While that does fix the security issue, I have to ask, how is that different from the following:

#let my_python_code = read("my_python_code.py")

#raw(my_python_code, lang: "python")

Why does the code need to be inside of typst?

Don't get me wrong, I think this is a feature that could 100% be in typst and has its uses, for example to build a testing framework that checks that all of your code snippets compile/work. But it feels more tangential to the issue we're discussing here.

I have previously said that I am in favour of people "pre-processing" their data, which, is already what we're all doing by exporting figures, etc. and importing them as images in typst code.

@PgBiel
Copy link
Contributor

PgBiel commented May 1, 2023

As stated above, it'd offer even better security.

To me, that statement can only really be true with self-contained plugins that have no system access outside of the project itself.

I don't think so, given that TeX shell-escape already gives unrestricted access to the system. So, with Typst users being able to arbitrarily restrict access to the system, I believe it'd be objectively better than TeX's current external tool system.

The only use for LaTeX shell-escape I ever encountered was syntax highlighting with some library (not listings, but another one whose name I couldn't be bothered to remember). That library uses Python to do syntax highlighting, could it be used in a published paper? (note that I have never (yet) published a paper so I am a bit less experienced on that topic).

I've used shell-escape before with a tool that caches TikZ figures in the project directory to speed up compilations. That seems like a legitimate use of plugins for me, actually. However, using shell-escape was bad not because of the plugin, but because of the unrestricted access to the system. A proper permissions system (where we'd just say "sure, you can write .png files to the project directory") would work fine here.

@FeldrinH
Copy link

FeldrinH commented Aug 6, 2023

What's the plan for using query+preprocessing based tools with the web app? Is that just impossible and if so, what is the recommendation for people who prefer the web app but need some preprocessing-based external tool?

@laurmaedje
Copy link
Member Author

What's the plan for using query+preprocessing based tools with the web app? Is that just impossible and if so, what is the recommendation for people who prefer the web app but need some preprocessing-based external tool?

Tools that could be packaged as WebAssembly can be integrated once WebAssembly plugins ship (probably with 0.8, the release after the next one).

Tools that cannot be packages in such a way are just really hard to integrate. If you need to preprocess infrequently, you can download the project, run the query and tools locally and upload the artifacts to the web app.

We might also add an API for downloading/uploading files automatically, but that would be a while down the road and probably an enterprise feature.

@FeldrinH
Copy link

FeldrinH commented Aug 7, 2023

Tools that could be packaged as WebAssembly can be integrated once WebAssembly plugins ship (probably with 0.8, the release after the next one).

Tools that cannot be packages in such a way are just really hard to integrate. If you need to preprocess infrequently, you can download the project, run the query and tools locally and upload the artifacts to the web app.

So the assumption is that most tools that need to be run frequently will be packaged as WASM plugins?

@laurmaedje
Copy link
Member Author

WebAssembly plugins are now available: #1555

@laurmaedje
Copy link
Member Author

So the assumption is that most tools that need to be run frequently will be packaged as WASM plugins?

We'll find out.

@iilyak
Copy link

iilyak commented Aug 28, 2023

It might be too late already since the wasm plugin system is already shipped. However this looks very interesting

https://github.com/extism/extism

@Andrew15-5
Copy link
Contributor

Andrew15-5 commented Oct 2, 2023

Am I too late for the RFC? Anyway, I wanted to share my thoughts. This is based purely on my experience and opinion (obviously).

No one uses TeX right now as LaTeX is far easier, and I think most packages assume that you use LaTeX syntax. But faced with Unicode/encoding issues, I quickly switched to LuaLaTeX. You know what this is? This is a beast, that changed my life (I only used it for about a year). Not only it is a LaTeX that can create PDF files, an upgrade that supports Unicode (I can write in any/many language(s) in the document), but it also a whole Lua inside the document. And you know what this is? This is both a minimalistic programming language and a path to the outer world with os.execute (basically). Which means that all the tools, programs and stuff can be read/executed and used inside of the LaTeX and PDF file. I heavily used minted with tcolorbox for inserting syntax highlighted code in the document (with Typst supports natively, and it's wonderful). Then I discovered pyluatex package that basically provides "native" Python support in lualatex. So at the end of the day, lualatex is supercharged with Lua and Python, which provides the author unlimited power (you can execute anything you want).

I will stop at that, because I can talk about things like using PRNG from Python in your LaTeX document and many other things for hours (maybe not that long). The point is that I want to have such unlimited power with Typst too.

Now I will address the main issues that everyone is talking about.
Yes, this is a huge security risk, yes, everyone need to trust the packages that uses this feature, yes, journals will reject such documents (I have no experience with that). But...

  • there are clearly a lot of benefits in providing this opt-in feature
  • if it is opt-in, then people that don't want it will be safe (if used packages also don't need this feature)
  • if people need this, then they will potentially put themselves at risk, but this program as all other FOSS projects is provided AS IS with no warranty and no liability, which means that people only have themselves to blame (if the warning about using shell escape is shown by default, then people will be 100% warned and if they are willing to ignore that, then again the blame is on them)
  • having this super-duper incredibly useful feature is much-much better, than not having it at all (which results in custom preprocessing scripts and things like that)
  • if the institute/place where you want to submit your document will reject it if it uses external tools, then you just need to make it not use external tools (prepare all assets before compiling the PDF and then submit the paper)
  • if the feature would be implemented, then either package's manual and/or package page (including Typst website) have to have a noticeable shell escape warning to maximize user's awareness of risks that they consent with
  • of course there will be platform dependent tools and programs, but almost 100% of documents will be created from the single machine anyway. If the author uses external stuff, then the document will be only created on their machine, or they will provide a documentation on how to compile the document (which stuff to install/run), or there is a tool called Docker, which eliminates the platform issue altogether.

TL;DR: there are a lot of risks and cons using shell escape, but they are only valid if the option is enabled and used. If the option is disabled by default, then all the issues suddenly disappear (with all the benefits, of course). This is just a "tool" for people that need it and can use it (no restrictions on how to make PDF files). Ultimately, as happened with pdflatex — a successor lualatex was created, same with Typst — a fork will be eventually created that enables shell escape, if the Typst itself won't do it.

P.S. Yes, I guess Typst have a WASM support, but where do I get this "plugins"? Another "plugin" manager/registry/repository? It's not useful if a common user can't find any Typst WASM plugins (I haven't found any useful links in the Typst documentation).

@Dherse
Copy link
Sponsor Collaborator

Dherse commented Oct 2, 2023

there are clearly a lot of benefits in providing this opt-in feature

None that cannot be achieved safely and with more compatibility than with typst query and plugins.

if it is opt-in, then people that don't want it would be safe (if used packages also don't need this feature)

The reason why most OSes like Windows and macOS ship with a lot of security features by default is because people don't necessarily understand the ramifications of the actions they take on their computer. As such, it is in the best common interest to keep things safe, the goal is not for typst to become an open door to RCEs.

if people need this, then they will potentially put themselves at risk, but this program as all other FOSS projects is provided AS IS with no warranty and no liability

Saying "oh we have no liability therefore it's okay" is akin to moral bankruptcy, just because you can do something, or let somebody else do it for that matter, doesn't mean you should.

having this super-duper incredibly useful feature is much-much better, than not having it at all

We have it, @laurmaedje and I did discuss on Discord the fact that we'd like to have a way of getting "real time" information from typst query like perhaps the ability to tyspt watch + query so that you can do it more streamlined and ease integration in tools like notebooks apps, while it is in theory equivalent, it requires the user to take manual steps and ownership of their pipeline, not simply toggling a boolean.

have a noticeable shell escape warning to maximize user's awareness of risks that they consent with

In my country (Belgium), there are signs every few kilometers saying "Focus op de weg, niet op je GSM" meaning "focus on the road, not on your phone", I can confidently say that those are not sufficient, warning people is not sufficient, and most people don't understand until they've crashed because they were looking at their damn phone. While this is a bit of a false equivalence, I hope it illustrates that people don't read warning signs.

image

but almost 100% of documents will be created from the single machine anyway

Unless you're doing collaborative editing which, judging by the popularity of platforms like Office 365 and Overlead, is not a far fetched idea. Keeping document self-contained is the best to keep the ecosystem tightly grouped and avoid the CLI, Web, and future Desktop targets from all being disparate.

but where do I get this "plugins"?

Plugins are just packages in the package registry, you wouldn't even know it's a WASM-based package, that's the awesome part. They package it with their package, they load it via the wasm function and it's there. They are also usually wrapped in a nicer API layer.

I am sorry for sounding so dismissive, but it feels like I've been saying that over and over again. I understand why people want shell-escape, but shell-escape comes at a cost that breaks the very fundamental concept of tools like typst (or Word for that matter): compile anywhere. In addition, they come with a security burden that is simply not achievable for a project like typst to control. At least with typst watch if people want to call to a shell it's done outside of our control and any safety issues that come from it are solely on the user. For example, I would strongly recommend against evaluating arbitrary code obtained using typst query, people still do it mind you, the only use cases I could see is to prepare a document like auto exporting/importing figures.

@astrale-sharp
Copy link
Contributor

I'll say it again, I see nothing that you cannot achieve with pre processing, typst can read files!
Opt in would fracture the usage and I think it would be a bad decision in the long run/

@Andrew15-5
Copy link
Contributor

Plugins are just packages in the package registry

Oh yeah, I've read about WASM plugin that should be wrapped in Typst wrapper function. So that's what it was about.

https://typst.app/docs/reference/foundations/plugin/:

plugin

A WebAssembly plugin.

This is advanced functionality and not to be confused with Typst packages.

This sentence threw me off. "Not to be confused with" sounds like "plugins are not packages, they are completely different things". Which is not 100% true, as plugins can be used in packages, so they are almost like packages, but in WASM. So maybe rephrasing it or making things more clear would be nice.

Ok, alright, "plugins are great", but I'm missing the "here is a list of things that you can do with plugins/WASM, which completely removes the necessity of using shell escape for accessing the power of Python or whatever". I need to make it clear that I'm not familiar with WASM that much except that "wow, it's like JS, but better (in some ways)" and that you can compile a program to WASM from many popular PLs.

Hmm, now that I wrote that, I'm beginning to see the bigger picture. Do I understand correctly, that I can make a Python program that I would like to call from Typst on compilation, but instead I compile it directly from Python to WASM and then use it (WASM) directly in Typst? So, basically executing Python in Typst with (many) extra steps.

Also, please, can someone provide a (small or big) list of things how to use typst query, because right now I don't see the value of this command. Like, "who" exactly should use the typst query? I don't want to, because the goal is to automate stuff. Do I need to run something besides Typst? Some "preprocessing" stuff? I'm sorry, but I don't get it and no examples are written in the https://typst.app/docs/reference/meta/query/#command-line-queries.

Yeas, and all this "preprocessing" stuff is also (with typst query) very confusing. I understand what preprocessing means generally and how typst_pyimage (mostly) works. But could anyone provide specific useful examples of this "preprocessing" and it is done (with what tools etc.).

I feel like I'm completely out of the loop (which is probably true), but this also shows how an average savvy user can be confused, let alone Windows Word users that never used Typst, CLI, WASM or never even written a simple program. Maybe a new guide about this preprocessing/WASM/plugin/query stuff should be added.

@Dherse
Copy link
Sponsor Collaborator

Dherse commented Oct 2, 2023

With regards to what you can do: pretty much anything that doesn’t talk to the world (i.e networking, file write, etc.). You can absolutely run pure Python by creating a plugin that uses RustPython interpreter, it’s slower than CPython or PiPi, but it should be sufficient for most typst-related uses.

I have been planning (but I have my hands full with #2282) to write a few plugins like a (good) PRNG, common hashing function (mostly for a weird idea by @edhebi), and finally an image processing library (after all I already wrote the code for it). But then again, I only have a limited amount of time and patience.

Typst query is used for querying metadata tags from a document, a fairly common use afaik is getting notes out of slides. I think @astrale-sharp used it to get answers out of test question for her students. And some people are using it to plot data externally and then embed it automatically. In fact, I am pretty sure somebody wrote a library for typst as a notebook.

Pre-processing is in its infancy, and until we have realtime typst query, I suspect it will remain so, but it’s coming and will hopefully be a nicer and safer interface than just raw shell access. Once we have that, I am sure that we will have a nice dedicated page in the docs (@reknih maybe?). Note that I intend on working on that right after gradients. (But anybody else can do it if they want!).

And I fully agree that a full guide on WASM plugins (in rust maybe, although @astrale-sharp made it far more generic than she needed to) and a full guide on pre-processing are a must have!

@PgBiel
Copy link
Contributor

PgBiel commented Oct 2, 2023

In my opinion, we should focus on improving the existing external tool systems (query and WASM plugins) rather than prioritize shell escape in any form at the current stage.

@Andrew15-5
Copy link
Contributor

Andrew15-5 commented Oct 3, 2023

creating a plugin that uses RustPython interpreter, it’s slower than CPython or PiPi, but it should be sufficient for most typst-related uses.

I know that I can compile RustPython to WASM, but I didn't find anything about compiling Python code to WASM (only to bytecode). And CPython is lacking a lot of features (didn't test it). And there is no PiPi, only PyPy.

plugins like a (good) PRNG

Interesting, since it is a "hot topic" in #984.

common hashing function

The one that I "commonly" use is Base64.

an image processing library

Oh, you mean like simple "image editor" where you can change colors, crop, blur and stuff like that? That could be very useful (but not for me).

Typst query is used for querying metadata tags from a document, a fairly common use afaik is getting notes out of slides. [...] get answers out of test question [...] plot data externally and then embed it automatically

See, this is what I don't get. First of all, do I have to use a variable to efficiently use "some value" in the document and also export it in the metadata()? Because I don't really understand why I would use metadata() inside of the Typst just so that I can then query it to some other tool, why not write this "metadata" directly in the external tool if it is not used in Typst in any way. But aside from that, sure you can "get answers out" or "plot data externally and then embed it automatically", b-but how it is done? Why do I even need to "get answers out", what is the end goal? And how can I embed plotted data automatically if it was plotted externally? I wanted to get some real world examples which include the workflow or how things are being automated, including the step of how exactly the typst query command is used (if Shellscript is used, then the source code of it would be perfect). Is this like a multistep process to generate a document, or is it some magic where you just write typst (c|w) and it does all the work for you?

realtime typst query

I don't quite get what this means (probably because of the previous paragraph in this comment). Does this refer to typst watch or something else?

Since you are busy, I didn't specifically ask you to answer all the questions, but anyone that can explain in depth (well, just explain the basics) this nifty little things called typst query & preprocessing. And BTW, I think this is still an appropriate place to continue this conversation, since it specifically targets (one of) the main question(s) of the issue (how to replace shell escape feature with other techniques/methods that are available right now to achieve the same result). And perhaps other people that probably exist could benefit from answers (while there are no guides in the documentation).

@Dherse
Copy link
Sponsor Collaborator

Dherse commented Oct 3, 2023

From what I understand, you can just query all of the raw blocks that contain python, extract it, run it, insert the result using a JSON file or something else

I don't quite get what this means (probably because of the previous paragraph in this comment). Does this refer to typst watch or something else?

Yes exactly, it would be like typst query but while typst watch, allowing you to get update/diffs to the state of queries

@iilyak
Copy link

iilyak commented Oct 3, 2023

The way typst project is structured allows anyone (with rust skills (which I agree reduces the amount of people)) to use it as a library. Which is actually a great way to implement tools which need to call out to scripts.

@astrale-sharp
Copy link
Contributor

You won't be able to compile python script to wasm, but you can give these script to a python interpreter that you compiled to wasm

@Andrew15-5
Copy link
Contributor

Oh, so I have to bundle the whole Python interpreter in WASM as well as a "copy" of Python program source code (that will be passed to the "WASM" Python interpreter). Isn't that expensive in terms of Typst package size? Or is there a way to only pick necessary features of Python in order to reduce interpreter WASM code size?

@astrale-sharp
Copy link
Contributor

astrale-sharp commented Oct 4, 2023

For the sake of simplicity, I would use pre processing here.

your_file.typ

#let c = counter("python")
#show raw: it => [
    #c.step()
    #it
    #locate(loc =>read("genepy" + str(c.at(loc).at(0)) + ".txt" )) // you will have to comment this line when running the preprocessor since the files don't exist yet
]

Behold, python pre processing 

```py
def func():
    return 4

print(func() + func())
``` <fetch>


```py
for k in range(10):
    print(k,end='')


``` <fetch>

script.py

import sys
from contextlib import redirect_stdout
# to be used with `typst query your_file.typ "<fetch>" --field text | python3 script.py`

text = ''
for line in sys.stdin:
    text+=line

# cleans typst query outputs
text =  map( lambda x : x.rstrip(",").strip().strip('"') ,text.splitlines()[slice(1,-1)])

for (key,v) in enumerate(text):
    v = v.encode('utf-8').decode('unicode-escape')
    # prints from your snippets in genepy$k.txt 
    with open('genepy' + str(key+1) +".txt", 'w') as f:
        with redirect_stdout(f):
            try:
                exec(v)
            except:
                print("something wrong with the file")

outputs

image

What's still difficult

  • It would be nice to be able to try reading but fallback to something if the file is not present so it could compile
  • to not try reading files when using typst query for instance with a annotation to indicate a line that musn't be run with in query mode
  • The documentation for typst query is lacking, I always find myself not understanding what I'm supposed to do, what the errors mean, etc
  • UPDATED It would also be nice to be able to have support for
    #show raw: it => [
        #c.step()
        #[#it <fetch>]
    ]

@astrale-sharp
Copy link
Contributor

Something like I wrote could probably be integrated in the documentation btw once the line in the typst code doesn't need to be commented in query mode

@laurmaedje

@Andrew15-5
Copy link
Contributor

Andrew15-5 commented Oct 4, 2023

IMO, you should wrap your_file.typ code, script.py code and screenshot in separate <details> tags to make the comment more concise.

I have never seen redirect_stdout and slice(). redirect_stdout can be super useful. Isn't [slice(1, -1)] just a [1:-1]?

your_file.typ

In the script.py you refer to it as nothing.typ for some reason.

  • It would be nice to be able to try reading but fallback to something if the file is not present so it could compile

See #2025.

  • to not try reading files when using typst query for instance with a annotation to indicate a line that musn't be run with in query mode

You can't read files without reading them. You meant interpreting/executing/compiling/parsing them (I don't know what is the right verb with Typst). The point is, that there should be no checks (if file exist) or no error messages.

#show raw: it => [
    #c.step()
    #[#it <fetch>]
]

No way! It seems that no one actually created an issue about this (until now: #2317). I also was disappointed a couple of times when this didn't work (this also prevents automatically making custom labels: #2007)

@Andrew15-5
Copy link
Contributor

Andrew15-5 commented Oct 4, 2023

Honestly, this is a very niche example, I believe, so it doesn't really describe the potential or more useful usages of preprocessing. But at least I now get the general idea of how it all look like (work). Thanks. I believe with WASM RustPython you can just call python() function and pass in the text of the Python source code like this:

#let code = ```py
print("Hello, world!")
```

#block(stroke: 1pt, radius: 0.2em, inset: 0.5em, code)
Output: #python(code.text)

(Unfortunately the new GitHub-approved Typst syntax highlighting has a bug, it would be great if someone can report it.)

In order to reuse the Python code 2 times (in Typst itself), you have to assign it to a variable, which looks less sexy, but I don't think there is any other way around it.

@laurmaedje
Copy link
Member Author

In order to reuse in Typst itself the Python code 2 times you have to assign it to a variable, which looks less sexy, but I don't think there is any other way around it.

Well, you could make a raw show rule with a special tag like python-eval.

@Andrew15-5
Copy link
Contributor

Well, you could make a raw show rule with a special tag like python-eval.

Yeah, this will be overall a much easier/cleaner approach, but this would entail that the python-eval would typeset the exact same template every time. Which generally is fine, but maybe something should be typeset a bit differently, then you would have to create python-eval-typset-differently. I mean, it's a nitpick, I guess.

@laurmaedje laurmaedje changed the title [RFC #5] Integration with external tools Integration with external tools Nov 14, 2023
@laurmaedje laurmaedje added proposal You think that something is a good idea or should work differently. and removed rfc labels Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal You think that something is a good idea or should work differently. scripting About Typst's coding capabilities
Projects
None yet
Development

No branches or pull requests