Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Investigate a "postgres target" rust compiler target #32

Closed
8 tasks done
eeeebbbbrrrr opened this issue Apr 4, 2022 · 22 comments
Closed
8 tasks done

RFC: Investigate a "postgres target" rust compiler target #32

eeeebbbbrrrr opened this issue Apr 4, 2022 · 22 comments
Assignees

Comments

@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Apr 4, 2022

I'd like to see what it would look like to design (develop?) and otherwise maintain a custom rust compilation target for Postgres that could give us a nearly "safe" (as Rust defines it) and "trusted" (as Postgres defines a procedural language) plrust.

As I know very little about this topic, please take these points with a grain of salt, but I suppose we need:

  • x86_64 support
  • aarch64 support
  • Linux only (for now)
  • to disallow most (all?) access to the host operating system (ie, no disk or socket I/O, etc, etc), mainly no filesystem access
  • to disallow directly calling into the active postmaster process (ie, no ability to reach into Postgres' memory space, despite us running in it)
  • strictly not WASM... we'd still be wanting to build CPU-native shared libraries

And some other points of consideration:

  • how to gracefully handle rust panics and have them interoperate with Postgres' transaction system
  • memory allocation (do we just use malloc/free or can we instead use Postgres' palloc/pfree` functions?

Please feel free to add other ideas here.

@eeeebbbbrrrr
Copy link
Contributor Author

eeeebbbbrrrr commented Apr 4, 2022

Postgres' definition of "TRUSTED" is here: https://www.postgresql.org/docs/14/xplang-install.html

The optional key word TRUSTED specifies that the language does not grant access to data that the user would not otherwise have. Trusted languages are designed for ordinary database users (those without superuser privilege) and allows them to safely create functions and procedures. Since PL functions are executed inside the database server, the TRUSTED flag should only be given for languages that do not allow access to database server internals or the file system. The languages PL/pgSQL, PL/Tcl, and PL/Perl are considered trusted; the languages PL/TclU, PL/PerlU, and PL/PythonU are designed to provide unlimited functionality and should not be marked trusted.

I see little reason for us to go beyond that and inventing our own definition.

Like, to me, this says that network access is just fine. Perhaps even the sound card and GPU.

@workingjubilee
Copy link
Contributor

I will refer to these hypothetical targets
x86_64-unknown-linux-postgres
aarch64-unknown-linux-postgres

This follows the established convention of using the last term in the target tuple to define the ABI and necessary runtime support (usually libc functions) of the target. The fundamental question to answer is, "Is it possible to compile Rust code to a target that uses Postgres's own functions for allocation and the like at the core of its runtime support, thus eliminating the impedance mismatch between Rust's runtime and Postgres's runtime?"

@Hoverbear
Copy link
Contributor

What about postgres versions? Their headers differ somewhat dramatically at times.

  • x86_64-unknown-linux-postgres-10
  • x86_64-unknown-linux-postgres-11
  • x86_64-unknown-linux-postgres-12
  • x86_64-unknown-linux-postgres-13
  • x86_64-unknown-linux-postgres-14

@workingjubilee
Copy link
Contributor

I am going to focus on trying to get the latest Postgres working and pretend that doesn't happen in ways that actually impact the extent of alloc/std we want to support, but yes, something like that might need to happen when I stop pretending.

@workingjubilee
Copy link
Contributor

workingjubilee commented Apr 19, 2022

Since trying to stuff Rust fully "into" the Postgres server's allocation and error handling logic is the main goal for this, I spent a fair amount of time studying Postgres's memory contexts for allocation and the like, how it handles memory allocations and pointers and aborts, how procedural language execution happens, the ereport mechanism, PGX's integration with it, and going over old Rust RFCs and details on how unsafe pointers should be handled, before jumping in. I don't need to know absolutely every detail to proceed, but the main concerns here are about allocation and unwinding, making this two to five different kinds of unsafe, depending. As a result I have a few in-progress branches off PGX that tweak the way PGX handles some of those for correctness and such (and hopefully also performance, as some functions wrangle more overhead than they need to) and will hopefully make things a bit easier to refactor in the future around this.

I didn't see anything that I think will prove to be a blocker and altogether I think it may actually wind up proving even easier than I thought, by, in the crate that PL/Rust code gets compiled into

  • override the #[panic_handler] with logic to eventually ereport
  • override the #[global allocator] to call into Rust code wrapping palloc/pfree
  • mark the crate as #![no_std] but with extern alloc bringing in normal Rust collections, that, since they default to the global allocator, now call into Postgres's palloc, and if anything overcommits and thus panics, it aborts via Postgres also.
  • modularize existing panic handling code out from PGX
  • modularize allocator control, etc. code into an individual crate
  • develop a new std as a new crate, for user convenience and to support Postgres-trusted API surface
  • make a convenient build system and compare things like a target-spec.json or using --sysroot

This likely means we wind up with a postgres-alloc and/or a postgres-panic crate in a workspace somewhere. And if necessary, recompiling std to guarantee it offers this interface and only this interface can be done, but that may be mmmostly unnecessary. May be more necessary? Unsure.

As far as my understanding currently goes, this would also make the current "dance" around pg_guard to catch Postgres's ereport and unwind Rust correctly mostly unnecessary by making sure that it would correctly clean up all allocations acquired. It would mean the resulting Rust code would have something that looks more like "abort semantics", and thus any clever cleanup mechanisms users may want would simply terminate instead, but there is a simple, sad truth of Rust:

Destructors are not guaranteed to be run.

@eeeebbbbrrrr
Copy link
Contributor Author

Destructors are not guaranteed to be run.

True. And in practice I don't know that they'll even come up as "necessary" within the confines of a single pl/rust function.

However, if we override the global allocator to use Postgres, it's important to point out that we'll lose the ability to do threading. Postgres is strictly not thread-safe, most especially its MemoryContext system.

You might think, "well, we could wrap the global allocator functions in a Mutex", but the overhead aside, we can't control what the main thread does as it's Postgres proper and won't be routed through Rust's global allocator.

@workingjubilee
Copy link
Contributor

workingjubilee commented Apr 19, 2022

We lose it twice, actually:

Rust's libcore doesn't have std::thread, so #![no_std] means no threads.
It might be possible to do something clever to recover this ability, but it's definitely not my focus at the moment.

@eeeebbbbrrrr
Copy link
Contributor Author

eeeebbbbrrrr commented Apr 19, 2022

So... @Hoverbear showed me some "hack" examples where you can still pull in std:: directly, even with the "no std" thing. And we can't control what external crates do -- or can we?

@workingjubilee
Copy link
Contributor

workingjubilee commented Apr 30, 2022

Alright, there were several delays due to... various points of confusion, some of them entertaining, most of them not, but I have now managed to get a global allocator override working. More of the work was attempting to figure out how to push a message back out effectively so I could prove to myself it was working as intended, honestly. It's enormously crude right now as I just directly inject the allocator's code and the #![no_std] annotations into the compiled code, But Hey, It Works, and it's not really that far from what PL/Rust currently looks like anyways. The proper implementation will probably be as a new crate in some repo or another, and... otherwise will look completely identical, it just will be compiled by cargo instead.

What I haven't done is successfully ungluing and overriding the panic handler... that requires more digging into PGX, because PL/Rust uses PGX to build code for obvious reasons... which means it links in PGX's panic handling code already, which does not play excellently with trying to override the panic handler a second time, causing linking errors. However, this is just solved by taking a hacksaw to PGX and telling it to not ALWAYS add code at various points, so as to make its panic handler modular, which is easily enough done. PL/Rust code does run already with these changes, it's just now in the awkward position of doing needless work.

@workingjubilee
Copy link
Contributor

workingjubilee commented Apr 30, 2022

As far as pulling in std in spite of the crate being #![no_std]: I am not entirely sure how #![no_std] is circumvented here if you directly annotate the crate, but I am very interested in hearing the details of the hack.

The way to get around that, I suspect, would be to use a "bare metal" target that corresponds to "use the current platform's binary object format, but without the platform's C runtime". That way, the contents of std that require runtime support beyond just an allocator would be empty, as one expects. This would be the x86_64-unknown-linux-none and a possible aarch64-unknown-linux-none target that is not a builtin Rust target but I expect would be easily specified.

That would then bring things within striking distance of "actually for real we're just implementing our own std now and using Postgres as the OS".

@eeeebbbbrrrr
Copy link
Contributor Author

Knock yourself out:

#![no_std]
extern crate std;

fn main() {
    let s = std::string::String::from("hi");
    std::eprintln!("{s}")
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=241f0043175eadfbe65b946599350262

@workingjubilee
Copy link
Contributor

Oh yeah, extern crate std. Yeah, I am pretty sure that's resoluble by just specifying std again, using rustc --extern to control the mapping, since PL/Rust implies some kind of compiler control, or the unstable -Zbuild-std at worst. The baseline could be an empty std that merely reexports alloc.

@eeeebbbbrrrr
Copy link
Contributor Author

I assume this would also prevent other 3rd party crates that might re-export std from exporting the real std and would instead use ours?

@workingjubilee
Copy link
Contributor

Correct. I confirmed last night that we have to build a "full-fledged" new std instead of a "mere" override, as a --extern itself doesn't do enough. So -Zbuild-std it is.

@workingjubilee
Copy link
Contributor

Currently blocked on #49 which requires a PR to PGX which I had been drafting. Likely need to toggle a feature based on inclusion of the Postgres allocator or not.

@workingjubilee
Copy link
Contributor

That was unblocked (mostly) and also I got everything building for postgrestd and just need to document and push everything up properly now. This will depend on several forked crates for the first version probably.

@workingjubilee
Copy link
Contributor

As of #56 a preliminary x86_64-postgres-linux-gnu tuple works on GNU/Linux with Postgres 14, and it passes build/test cycles on that (though I... somehow managed to break macOS in CI, even without postgrestd???)

The main tradeoff things ran up against is that while I was researching things, it proved untenable to actually carve out entire modules of Rust code at compile time (which was discussed out-of-band as a possible desirable), and I was instead eventually forced to shift towards a strategy of using stubs instead (that can still return failure, success, or whatever based on the implementation).

@Hoverbear
Copy link
Contributor

I've been playing with it this morning and it seems to work quite well!!!

@workingjubilee
Copy link
Contributor

Yesterday I realized and implemented everything that had to happen to effectuate a new version of the Rust target-specific std::sys implementation that has most of the security properties we're looking for. It should be relatively resilient when ported to different Unix-y systems, or indeed new versions of the Rust compiler, though it is in and of itself largely self-contained.

I haven't explored all the benefits and costs of this implementation yet but it should basically eliminate any questions about safety from Safe Rust while also minimizing the number of forked dependencies needed in the future.

@workingjubilee
Copy link
Contributor

Main insight from spending a bit of time investigating it today: Aside from blocking off std::fs from doing anything of interest, the new variant + the changes in pgx in pgcentralfoundation/pgrx#607 means no longer having to juggle any more forked dependencies just for basic user crates! No guarantee random ecosystem crates won't break, still.

@workingjubilee
Copy link
Contributor

workingjubilee commented Jul 21, 2022

Even libc also no longer requires forking with the revised approach!

As panic handling currently more or less Just Works (well, after I fixed a few cases which surfaced inadequacies in pgx with respect to "suddenly being used in ways neither it nor Rust were ever really expecting to be used"), I have been spending more time on examining the general concern with regard to unsafe code and preventing calling directly into the postmaster process in ways we don't want. This will likely require extending PL/Rust and/or PGX a fair amount with capabilities for analyzing code, or at least doing more interesting build steps.

One possibility that was brought up is simply running a "pre-build" step where we cargo fetch all dependencies and then compile them in advance, controlling the warning levels in ways that cargo normally would not do, to verify that the dependency graph does not contain unsafe code... unless we would otherwise permit unsafe code in that dependency.

@workingjubilee
Copy link
Contributor

I still want to modularize the existing panic handler code but it does appear to work, and everything seems to work with aarch64 support as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants