Skip to content

Conversation

@yrashk
Copy link
Contributor

@yrashk yrashk commented Sep 26, 2022

No description provided.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool. This looks interesting.
A test confirming this works would be fantastic!

This also changes termination API for background workers a little bit,
enforcing shutdown-scoped limitations on the functionality available.
@yrashk
Copy link
Contributor Author

yrashk commented Sep 29, 2022

@workingjubilee I've added the first draft of the test; please let me know what you think.

It's not really doing anything as it hooks into postgres initialization

Solution: remove it and put a note in comments
@yrashk
Copy link
Contributor Author

yrashk commented Oct 13, 2022

@workingjubilee I've updated this PR to be based on the latest develop and removed shmem startup hook as it doesn't make sense in the dynamic registration scenario.

I've considered changing the API for the startup hook to make it impossible to use dynamic registration if the hook is provided, but for now, I decided to keep it as is unless you want me to. Something like:

pub struct BackgroundWorkerBuilder<const Dynamic: bool = true> { .... }

pub fn enable_shmem_access(mut self: Self) -> Self { ... }
pub fn shmem_startup_hook(mut self: Self, unsafe extern "C" fn()) -> BackgroundWorkerBuilder<false> { ... }

impl BackgroundWorkerBuilder<true> {
  pub fn load_dynamic(self: Self) -> BackgroundWorkerHandle { ... }
}

@yrashk
Copy link
Contributor Author

yrashk commented Oct 13, 2022

@workingjubilee @eeeebbbbrrrr I've updated the documentation as requested.

This is rather unpleasant behaviour as it is unrecoverable.

Solution: check if the PID was correctly configured and return an error
if it wasn't
@yrashk
Copy link
Contributor Author

yrashk commented Oct 13, 2022

@eeeebbbbrrrr I've replaced the blocking on wait_* with an error.

It now properly reflects the fact that dynamic background workers are
supported.
@eeeebbbbrrrr
Copy link
Contributor

I think this looks good now. Let’s have @workingjubilee confirm next week before we merge.

Thanks for the work, @yrashk!

@yrashk
Copy link
Contributor Author

yrashk commented Oct 17, 2022

Anything else I can do here to make this ready to be merged?

Copy link
Contributor

@thomcc thomcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, but there might be something I'm missing, so it's probably better if @workingjubilee takes a look

@thomcc thomcc requested a review from workingjubilee October 17, 2022 16:52
In some cases, it is not desirable for the builder to load the
background worker. For example, when actual loading is handled by some
external code that uses this structure from the C API.

Solution: make `BackgroundWorkerBuilder::pg_sys_worker` public
Its naming and scoping is problematic, it's not obvious what it does.

Solution: convert it into an `Into` trait implementation
Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the presence of tests makes this much easier to review. I have a few nits that aren't terribly important, but I also have a number of design questions about how this is being handled and some edge-case-related concerns before this can be merged. However it looks like we're getting close.

let worker = BackgroundWorkerBuilder::new("dynamic_bgworker")
.set_library("pgx_tests")
.set_function("bgworker")
.set_argument(123i32.into_datum())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...wait, what happens if this argument is None? I realize this is almost certainly going to be a case of severe programmer error, but we should test the None case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption here was that it was up to the worker to handle the argument correctly. What exactly do you suggest to test? Maybe I missing your train of thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I mostly mean that this basically goes Option<Datum> -> NullableDatum -> Option<Datum> (via Postgres) and we should make sure, as simple as it sounds, that if we say the Datum is "SQL NULL" (i.e. None) that it also becomes a None on the other side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( I have recently become suspicious of these kinds of transitions after finding out that sometimes Postgres will pass values that are basically nullptr_t, true and so I am wondering if there are other hijinx in store. )

I am also kind of wondering what happens if a BackgroundWorker, er, fucks up massively, and if PGX can actually recover from that with our current API surface.

pg_sys::BgwHandleStatus_BGWH_NOT_YET_STARTED => BackgroundWorkerStatus::NotYetStarted,
pg_sys::BgwHandleStatus_BGWH_STOPPED => BackgroundWorkerStatus::Stopped,
pg_sys::BgwHandleStatus_BGWH_POSTMASTER_DIED => BackgroundWorkerStatus::PostmasterDied,
_ => unreachable!(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ => unreachable!(),
_ => unreachable!("postgres returned unknown BgwHandleStatus"),

///
/// Requires `BackgroundWorkerBuilder.bgw_notify_pid` to be set to `pg_sys::MyProcPid`, otherwise it'll
/// return [`BackgroundWorkerStatus::Untracked`] error
pub fn wait_for_shutdown(&self) -> Result<(), BackgroundWorkerStatus> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is erroneous to do anything but reap the worker. We should enforce that.

Suggested change
pub fn wait_for_shutdown(&self) -> Result<(), BackgroundWorkerStatus> {
pub fn wait_for_shutdown(self) -> Result<(), BackgroundWorkerStatus> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure this would be the correct handling. Waiting for an event that may not succeed shouldn't necessarily reap the handle, should it? I agree that this is rather a corner case, but let's say it returns, um, Untracked error status, and you still have the pid to work with (of course, you can save it beforehand, but I am just trying to explore the idea why do we need to reap it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. If it was Untracked, then what valid, safe actions remain with that pid that require it to remain attached to its source TerminatingBackgroundWorker type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going outside of PG's envelope, sending a signal to that PID?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Isn't that fine though because that means we're doing another form of IPC then (and quite possibly unsafe code! or at least code that isn't in PGX) and we still have dispensed with the idea of it being some kind of BackgroundWorker, we're just trying to murder what we now see as a rogue process with whatever sky-lasers the OS has to offer us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_shutdown isn't a request for murder; it simply waits until the worker dies. Very Zen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, right, but interacting with the PID "directly" isn't that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would somewhat prefer to optimistically assume that consuming is okay (especially since a consuming-oriented approach is what's used by the rest of this new API surface) and revisit this assumption if we find a use case that is blocked by this.

///
/// Requires `BackgroundWorkerBuilder.bgw_notify_pid` to be set to `pg_sys::MyProcPid`, otherwise it'll
/// return [`BackgroundWorkerStatus::Untracked`] error
pub fn wait_for_shutdown(self) -> Result<(), BackgroundWorkerStatus> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when this is called on a worker that hasn't been started yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code suggests that this is fine. It'll simply wait until it is stopped.

@workingjubilee
Copy link
Member

A design question that came up in my mind that I don't know enough about BackgroundWorkers to remark on: Can a BackgroundWorker, if registered and loaded dynamically, move to any other lifecycle point? Or does it always progress in a fairly linear fashion (from "not yet started" to "active" to "ended")? In other words: is this enum truly an enum or is this trying to represent a state machine? We can enforce a much more rigid progression if so, if we want to. This would make the design more complex but would also make it harder to misuse the interface.

From @workingjubilee:

When a handle is a noun, typical Rust form is to use its name unadorned
(almost all our "smart pointers" are also "handles", technically
 speaking). Exceptions are for verbs (JoinHandle) and truly ambiguous
types (e.g. RawHandle and OwnedHandle).

Solution: rename relevant types
@yrashk
Copy link
Contributor Author

yrashk commented Oct 18, 2022

A design question that came up in my mind that I don't know enough about BackgroundWorkers to remark on: Can a BackgroundWorker, if registered and loaded dynamically, move to any other lifecycle point? Or does it always progress in a fairly linear fashion (from "not yet started" to "active" to "ended")? In other words: is this enum truly an enum or is this trying to represent a state machine? We can enforce a much more rigid progression if so, if we want to. This would make the design more complex but would also make it harder to misuse the interface.

I can't say I am an expert in the intricacies of PostgreSQL. Reading this:

typedef enum BgwHandleStatus
{
BGWH_STARTED, /* worker is running /
BGWH_NOT_YET_STARTED, /
worker hasn't been started yet /
BGWH_STOPPED, /
worker has exited /
BGWH_POSTMASTER_DIED /
postmaster died; worker status unclear */
} BgwHandleStatus;

and the documentation suggests that it is linear. not yet started -> started -> stopped and unclear.

What kind of state machine did you have in mind that it might represent?

@workingjubilee
Copy link
Member

I figured something like that would be the case. I was wondering if using a generic-enforced state machine (as in https://hoverbear.org/blog/rust-state-machine-pattern/ 3.3 and 3.4) might be better as a model (with the obvious clause that we can go from the "ending" back to the start again if we restart the worker, I think?) or if that would be too cumbersome. We kind of already have that half-way here.

@yrashk
Copy link
Contributor Author

yrashk commented Oct 18, 2022

I figured something like that would be the case. I was wondering if using a generic-enforced state machine (as in https://hoverbear.org/blog/rust-state-machine-pattern/ 3.3 and 3.4) might be better as a model (with the obvious clause that we can go from the "ending" back to the start again if we restart the worker, I think?) or if that would be too cumbersome. We kind of already have that half-way here.

My current feeling about this is that it is a) an overkill for now b) Postgres handles restarts itself so one can't really go "stopped->not_started->started" on their own.

I propose to break this out into a separate discussion. I am more than willing to commit some time to improving these small details in followups.

@workingjubilee
Copy link
Member

Hm. I think it's fine to defer some of this to followups and I think we'll need a little time experimenting with this before we can be sure of what the right API is, anyways, so a major break on it is inevitable, but in the general case we should try to tend towards stricter (within reason) APIs -> less strict APIs. It's harder to take back relaxations of the rules when you've already smoothed a given path for something.

@yrashk
Copy link
Contributor Author

yrashk commented Oct 19, 2022 via email

It doesn't match the signature of
`DynamicBackgroundWorker::wait_for_shutdown`

Solution: ensure they are the same and there's no confusion
@yrashk
Copy link
Contributor Author

yrashk commented Oct 19, 2022

@workingjubilee I've addressed the issue with TerminatingDynamicBackgroundWorker::wait_for_shutdown signature as we've discussed.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then away we go!

@workingjubilee workingjubilee merged commit 8aa148a into pgcentralfoundation:develop Oct 19, 2022
@yrashk
Copy link
Contributor Author

yrashk commented Oct 19, 2022

Woohoo! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants