RFC: Investigate a "postgres target" rust compiler target #32

eeeebbbbrrrr · 2022-04-04T18:18:52Z

I'd like to see what it would look like to design (develop?) and otherwise maintain a custom rust compilation target for Postgres that could give us a nearly "safe" (as Rust defines it) and "trusted" (as Postgres defines a procedural language) plrust.

As I know very little about this topic, please take these points with a grain of salt, but I suppose we need:

x86_64 support
aarch64 support
Linux only (for now)
to disallow most (all?) access to the host operating system ~~(ie, no disk or socket I/O, etc, etc)~~, mainly no filesystem access
to disallow directly calling into the active postmaster process (ie, no ability to reach into Postgres' memory space, despite us running in it)
strictly not WASM... we'd still be wanting to build CPU-native shared libraries

And some other points of consideration:

how to gracefully handle rust panics and have them interoperate with Postgres' transaction system
memory allocation (do we just use malloc/free or can we instead use Postgres' palloc/pfree` functions?

Please feel free to add other ideas here.

The text was updated successfully, but these errors were encountered:

eeeebbbbrrrr · 2022-04-04T20:41:11Z

Postgres' definition of "TRUSTED" is here: https://www.postgresql.org/docs/14/xplang-install.html

The optional key word TRUSTED specifies that the language does not grant access to data that the user would not otherwise have. Trusted languages are designed for ordinary database users (those without superuser privilege) and allows them to safely create functions and procedures. Since PL functions are executed inside the database server, the TRUSTED flag should only be given for languages that do not allow access to database server internals or the file system. The languages PL/pgSQL, PL/Tcl, and PL/Perl are considered trusted; the languages PL/TclU, PL/PerlU, and PL/PythonU are designed to provide unlimited functionality and should not be marked trusted.

I see little reason for us to go beyond that and inventing our own definition.

Like, to me, this says that network access is just fine. Perhaps even the sound card and GPU.

workingjubilee · 2022-04-04T21:11:24Z

I will refer to these hypothetical targets
x86_64-unknown-linux-postgres
aarch64-unknown-linux-postgres

This follows the established convention of using the last term in the target tuple to define the ABI and necessary runtime support (usually libc functions) of the target. The fundamental question to answer is, "Is it possible to compile Rust code to a target that uses Postgres's own functions for allocation and the like at the core of its runtime support, thus eliminating the impedance mismatch between Rust's runtime and Postgres's runtime?"

Hoverbear · 2022-04-04T21:13:19Z

What about postgres versions? Their headers differ somewhat dramatically at times.

x86_64-unknown-linux-postgres-10
x86_64-unknown-linux-postgres-11
x86_64-unknown-linux-postgres-12
x86_64-unknown-linux-postgres-13
x86_64-unknown-linux-postgres-14

workingjubilee · 2022-04-04T23:30:49Z

I am going to focus on trying to get the latest Postgres working and pretend that doesn't happen in ways that actually impact the extent of alloc/std we want to support, but yes, something like that might need to happen when I stop pretending.

workingjubilee · 2022-04-19T00:43:57Z

Since trying to stuff Rust fully "into" the Postgres server's allocation and error handling logic is the main goal for this, I spent a fair amount of time studying Postgres's memory contexts for allocation and the like, how it handles memory allocations and pointers and aborts, how procedural language execution happens, the ereport mechanism, PGX's integration with it, and going over old Rust RFCs and details on how unsafe pointers should be handled, before jumping in. I don't need to know absolutely every detail to proceed, but the main concerns here are about allocation and unwinding, making this two to five different kinds of unsafe, depending. As a result I have a few in-progress branches off PGX that tweak the way PGX handles some of those for correctness and such (and hopefully also performance, as some functions wrangle more overhead than they need to) and will hopefully make things a bit easier to refactor in the future around this.

I didn't see anything that I think will prove to be a blocker and altogether I think it may actually wind up proving even easier than I thought, by, in the crate that PL/Rust code gets compiled into

override the #[panic_handler] with logic to eventually ereport
override the #[global allocator] to call into Rust code wrapping palloc/pfree
mark the crate as #![no_std] but with extern alloc bringing in normal Rust collections, that, since they default to the global allocator, now call into Postgres's palloc, and if anything overcommits and thus panics, it aborts via Postgres also.
modularize existing panic handling code out from PGX
modularize allocator control, etc. code into an individual crate
develop a new std as a new crate, for user convenience and to support Postgres-trusted API surface
make a convenient build system and compare things like a target-spec.json or using --sysroot

This likely means we wind up with a postgres-alloc and/or a postgres-panic crate in a workspace somewhere. ~~And if necessary, recompiling std to guarantee it offers this interface and only this interface can be done, but that may be mmmostly unnecessary.~~ May be more necessary? Unsure.

As far as my understanding currently goes, this would also make the current "dance" around pg_guard to catch Postgres's ereport and unwind Rust correctly mostly unnecessary by making sure that it would correctly clean up all allocations acquired. It would mean the resulting Rust code would have something that looks more like "abort semantics", and thus any clever cleanup mechanisms users may want would simply terminate instead, but there is a simple, sad truth of Rust:

Destructors are not guaranteed to be run.

eeeebbbbrrrr · 2022-04-19T13:32:59Z

Destructors are not guaranteed to be run.

True. And in practice I don't know that they'll even come up as "necessary" within the confines of a single pl/rust function.

However, if we override the global allocator to use Postgres, it's important to point out that we'll lose the ability to do threading. Postgres is strictly not thread-safe, most especially its MemoryContext system.

You might think, "well, we could wrap the global allocator functions in a Mutex", but the overhead aside, we can't control what the main thread does as it's Postgres proper and won't be routed through Rust's global allocator.

workingjubilee · 2022-04-19T21:17:06Z

We lose it twice, actually:

Rust's libcore doesn't have std::thread, so #![no_std] means no threads.
It might be possible to do something clever to recover this ability, but it's definitely not my focus at the moment.

eeeebbbbrrrr · 2022-04-19T21:20:29Z

So... @Hoverbear showed me some "hack" examples where you can still pull in std:: directly, even with the "no std" thing. And we can't control what external crates do -- or can we?

workingjubilee · 2022-04-30T01:44:32Z

Alright, there were several delays due to... various points of confusion, some of them entertaining, most of them not, but I have now managed to get a global allocator override working. More of the work was attempting to figure out how to push a message back out effectively so I could prove to myself it was working as intended, honestly. It's enormously crude right now as I just directly inject the allocator's code and the #![no_std] annotations into the compiled code, But Hey, It Works, and it's not really that far from what PL/Rust currently looks like anyways. The proper implementation will probably be as a new crate in some repo or another, and... otherwise will look completely identical, it just will be compiled by cargo instead.

What I haven't done is successfully ungluing and overriding the panic handler... that requires more digging into PGX, because PL/Rust uses PGX to build code for obvious reasons... which means it links in PGX's panic handling code already, which does not play excellently with trying to override the panic handler a second time, causing linking errors. However, this is just solved by taking a hacksaw to PGX and telling it to not ALWAYS add code at various points, so as to make its panic handler modular, which is easily enough done. PL/Rust code does run already with these changes, it's just now in the awkward position of doing needless work.

workingjubilee · 2022-04-30T04:16:11Z

As far as pulling in std in spite of the crate being #![no_std]: I am not entirely sure how #![no_std] is circumvented here if you directly annotate the crate, but I am very interested in hearing the details of the hack.

The way to get around that, I suspect, would be to use a "bare metal" target that corresponds to "use the current platform's binary object format, but without the platform's C runtime". That way, the contents of std that require runtime support beyond just an allocator would be empty, as one expects. This would be the x86_64-unknown-linux-none and a possible aarch64-unknown-linux-none target that is not a builtin Rust target but I expect would be easily specified.

That would then bring things within striking distance of "actually for real we're just implementing our own std now and using Postgres as the OS".

eeeebbbbrrrr · 2022-05-01T18:34:07Z

Knock yourself out:

#![no_std]
extern crate std;

fn main() {
    let s = std::string::String::from("hi");
    std::eprintln!("{s}")
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=241f0043175eadfbe65b946599350262

workingjubilee · 2022-05-01T22:18:21Z

Oh yeah, extern crate std. Yeah, I am pretty sure that's resoluble by just specifying std again, using rustc --extern to control the mapping, since PL/Rust implies some kind of compiler control, or the unstable -Zbuild-std at worst. The baseline could be an empty std that merely reexports alloc.

eeeebbbbrrrr · 2022-05-02T12:55:17Z

I assume this would also prevent other 3rd party crates that might re-export std from exporting the real std and would instead use ours?

workingjubilee · 2022-05-03T17:31:42Z

Correct. I confirmed last night that we have to build a "full-fledged" new std instead of a "mere" override, as a --extern itself doesn't do enough. So -Zbuild-std it is.

workingjubilee · 2022-05-31T19:55:39Z

Currently blocked on #49 which requires a PR to PGX which I had been drafting. Likely need to toggle a feature based on inclusion of the Postgres allocator or not.

workingjubilee · 2022-06-24T02:36:54Z

That was unblocked (mostly) and also I got everything building for postgrestd and just need to document and push everything up properly now. This will depend on several forked crates for the first version probably.

workingjubilee · 2022-06-30T09:40:56Z

As of #56 a preliminary x86_64-postgres-linux-gnu tuple works on GNU/Linux with Postgres 14, and it passes build/test cycles on that (though I... somehow managed to break macOS in CI, even without postgrestd???)

The main tradeoff things ran up against is that while I was researching things, it proved untenable to actually carve out entire modules of Rust code at compile time (which was discussed out-of-band as a possible desirable), and I was instead eventually forced to shift towards a strategy of using stubs instead (that can still return failure, success, or whatever based on the implementation).

Hoverbear · 2022-06-30T17:08:03Z

I've been playing with it this morning and it seems to work quite well!!!

workingjubilee · 2022-07-13T21:26:10Z

Yesterday I realized and implemented everything that had to happen to effectuate a new version of the Rust target-specific std::sys implementation that has most of the security properties we're looking for. It should be relatively resilient when ported to different Unix-y systems, or indeed new versions of the Rust compiler, though it is in and of itself largely self-contained.

I haven't explored all the benefits and costs of this implementation yet but it should basically eliminate any questions about safety from Safe Rust while also minimizing the number of forked dependencies needed in the future.

workingjubilee · 2022-07-14T02:50:04Z

Main insight from spending a bit of time investigating it today: Aside from blocking off std::fs from doing anything of interest, the new variant + the changes in pgx in pgcentralfoundation/pgrx#607 means no longer having to juggle any more forked dependencies just for basic user crates! No guarantee random ecosystem crates won't break, still.

workingjubilee · 2022-07-21T23:56:44Z

Even libc also no longer requires forking with the revised approach!

As panic handling currently more or less Just Works (well, after I fixed a few cases which surfaced inadequacies in pgx with respect to "suddenly being used in ways neither it nor Rust were ever really expecting to be used"), I have been spending more time on examining the general concern with regard to unsafe code and preventing calling directly into the postmaster process in ways we don't want. This will likely require extending PL/Rust and/or PGX a fair amount with capabilities for analyzing code, or at least doing more interesting build steps.

One possibility that was brought up is simply running a "pre-build" step where we cargo fetch all dependencies and then compile them in advance, controlling the warning levels in ways that cargo normally would not do, to verify that the dependency graph does not contain unsafe code... unless we would otherwise permit unsafe code in that dependency.

workingjubilee · 2022-09-27T04:59:44Z

I still want to modularize the existing panic handler code but it does appear to work, and everything seems to work with aarch64 support as well.

eeeebbbbrrrr assigned workingjubilee Apr 4, 2022

johnrballard mentioned this issue Jul 27, 2022

RFC: Trusted Language Handling #25

Closed

workingjubilee added the rfc label Aug 18, 2022

workingjubilee added the build label Sep 9, 2022

workingjubilee closed this as completed Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Investigate a "postgres target" rust compiler target #32

RFC: Investigate a "postgres target" rust compiler target #32

eeeebbbbrrrr commented Apr 4, 2022 •

edited by workingjubilee

Loading

eeeebbbbrrrr commented Apr 4, 2022 •

edited by workingjubilee

Loading

workingjubilee commented Apr 4, 2022

Hoverbear commented Apr 4, 2022

workingjubilee commented Apr 4, 2022

workingjubilee commented Apr 19, 2022 •

edited

Loading

eeeebbbbrrrr commented Apr 19, 2022

workingjubilee commented Apr 19, 2022 •

edited

Loading

eeeebbbbrrrr commented Apr 19, 2022 •

edited

Loading

workingjubilee commented Apr 30, 2022 •

edited

Loading

workingjubilee commented Apr 30, 2022 •

edited

Loading

eeeebbbbrrrr commented May 1, 2022

workingjubilee commented May 1, 2022

eeeebbbbrrrr commented May 2, 2022

workingjubilee commented May 3, 2022

workingjubilee commented May 31, 2022

workingjubilee commented Jun 24, 2022

workingjubilee commented Jun 30, 2022

Hoverbear commented Jun 30, 2022

workingjubilee commented Jul 13, 2022

workingjubilee commented Jul 14, 2022

workingjubilee commented Jul 21, 2022 •

edited

Loading

workingjubilee commented Sep 27, 2022

RFC: Investigate a "postgres target" rust compiler target #32

RFC: Investigate a "postgres target" rust compiler target #32

Comments

eeeebbbbrrrr commented Apr 4, 2022 • edited by workingjubilee Loading

eeeebbbbrrrr commented Apr 4, 2022 • edited by workingjubilee Loading

workingjubilee commented Apr 4, 2022

Hoverbear commented Apr 4, 2022

workingjubilee commented Apr 4, 2022

workingjubilee commented Apr 19, 2022 • edited Loading

eeeebbbbrrrr commented Apr 19, 2022

workingjubilee commented Apr 19, 2022 • edited Loading

eeeebbbbrrrr commented Apr 19, 2022 • edited Loading

workingjubilee commented Apr 30, 2022 • edited Loading

workingjubilee commented Apr 30, 2022 • edited Loading

eeeebbbbrrrr commented May 1, 2022

workingjubilee commented May 1, 2022

eeeebbbbrrrr commented May 2, 2022

workingjubilee commented May 3, 2022

workingjubilee commented May 31, 2022

workingjubilee commented Jun 24, 2022

workingjubilee commented Jun 30, 2022

Hoverbear commented Jun 30, 2022

workingjubilee commented Jul 13, 2022

workingjubilee commented Jul 14, 2022

workingjubilee commented Jul 21, 2022 • edited Loading

workingjubilee commented Sep 27, 2022

eeeebbbbrrrr commented Apr 4, 2022 •

edited by workingjubilee

Loading

eeeebbbbrrrr commented Apr 4, 2022 •

edited by workingjubilee

Loading

workingjubilee commented Apr 19, 2022 •

edited

Loading

workingjubilee commented Apr 19, 2022 •

edited

Loading

eeeebbbbrrrr commented Apr 19, 2022 •

edited

Loading

workingjubilee commented Apr 30, 2022 •

edited

Loading

workingjubilee commented Apr 30, 2022 •

edited

Loading

workingjubilee commented Jul 21, 2022 •

edited

Loading