Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusable columns #451

Open
edgarogh opened this issue Feb 21, 2022 · 8 comments
Open

Reusable columns #451

edgarogh opened this issue Feb 21, 2022 · 8 comments
Labels
enhancement New feature or request revisit-for:0.6

Comments

@edgarogh
Copy link

I have a #[pg_export] function with a lot of columns. Currently 30, but it might come close to 50 because I'm representing relatively complex types with a lot of fields.

The problem is that I have also 30 lines of :

  -> impl Iterator<Item = (
    name!(col_a, Type),
    name!(col_b, Type),
    name!(col_c, Type),
    name!(col_d, Type),
    name!(col_e, Type),
    name!(col_f, Type),
    name!(col_g, Type),
    ...
  )> {

And 30 lines of :

  (
    value_a,
    value_b,
    value_c,
    value_d,
    ...
  )

at the end of my function, with no easy way to know which value corresponds to which column at first glance.

Also, I'm thinking of creating a variant of my current #[pg_export] function, that has different parameters but returns the exact same values in the end... which means I'd have to copy-paste everything and always keep the structure in sync.

I'm looking for a way to define a structure that represents all of these columns as named fields, so all of my problems are solved at once: the function signatures are kept simple and the returned value is much easier to read.

I looked a bit in the documentation and found #[derive(PostgresType)] but it doesn't look like it's what I'm looking for as it requires deriving serde's traits, which a lot of pgx-provided PostgreSQL types don't derive.

@edgarogh
Copy link
Author

As a temporary fix, I made edgarogh/pgx-named-columns.

@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Feb 23, 2022

My initial thought, based on the example from your repo was to instead use macro_rules!, like so:

#[allow(non_snake_case)]
macro_rules! IndexedLetter {
    () => { (name!(index, i32), name!(letter, &'static str)) };
    (idx : $idx:expr, letter : $letter:expr) => { (i$dx, $letter) };
}

#[pg_extern]
fn alphabet(length: i8) -> impl Iterator<Item = IndexedLetter!()> {
    ALPHABET
        .chars()
        .take(length.clamp(0, 25) as usize)
        .enumerate()
        .map(|(idx, letter)| IndexedLetter! {
            idx: idx as _,
            letter,
        })
}

Unfortunately, pgx fails to compile that with:

error: custom attribute panicked
  --> src/lib.rs:42:1
   |
42 | #[pg_extern]
   | ^^^^^^^^^^^^
   |
   = help: message: not implemented: Only iters with tuples, got Macro(TypeMacro { mac: Macro { path: Path { leading_colon: None, segments: [PathSegment { ident: Ident { ident: "IndexedLetter", span: #0 bytes(1150..1163) }, arguments: None }] }, bang_token: Bang, delimiter: Paren(Paren), tokens: TokenStream [] } }).

Which I suppose makes sense. Perhaps we could teach pgx to expand macros in that position -- I'm not entirely sure. @Hoverbear?

Maybe we could also invent our own expand_struct_to_tuple!() macro, but then we'd have the same issue you've got of finding the struct somewhere.

I agree that managing an anonymous tuple with lots of fields is unwieldy, but I'm not sure that faking it with a struct is really the right answer. I think it starts to get confused with trying to actually return that type instead.

@edgarogh
Copy link
Author

edgarogh commented Feb 23, 2022

edgarogh/pgx-named-columns is clearly a hack, and nothing more. It isn't even capable of forwarding tokens to #[pg_extern()]'s arguments currently. But I absolutely want my actual pgx code to be as DRY as possible, and being readable is also a big plus, hence the creation of the former crate in the mean time. It will probably not be developed anymore, at most maintained if people have PRs to submit.

Perhaps we could teach pgx to expand macros in that position -- I'm not entirely sure.

I'm pretty sure that macros always execute from the outside to the inside. That's why the name! inert macro works. It is (fortunately) expanded after #[pg_extern] could run and read its first tokens: the column names.

It is not AFAIK, possible to expand a macro manually from within another macro. There are exceptions inside the compiler (i.e. concat!("text_", env!("VARIABLE"))) but user-defined proc-macros simply don't have enough information in the token streams to expand macros. Each and every standard macro (include, file, line, concat, env, ...) would require manual handling, path resolution would have to be re-implemented (i.e.: use pgx::*, serde_json::json! {}), declarative macro expansion need to be completely re-implemented and the list goes on. Without a specific support in the standard proc_macro crate (which could happen, but definitely not today), this looks impossible to do correctly.

At the same time, I'm not sure that macros were ever supposed to be used like expressions (regarding the concat! example). This is off-topic, but a fn concat(&'static [&'static str]) -> &'static, handled as a special case by the compiler when possible, would be much more logical to me. And for this exact reason, I think that pgx goes too far with its use of macros. They're not supposed to change a function signature in the first place, which #[pg_export] seems to do.

But I completely understand that it's a complex problem. I'm guessing pgx uses macros so extensively because a lot of type information has to be collected to generate correct SQL schemas, but proc-macros don't have access to type-level information so they have to require using a weird meta-syntax.


To me, the only solution, and I'm not implying that it is simple nor realistically feasible, or that contributors would like to waste time on this, would be to cut down on macro usage and rely on more traditional Rust features, most importantly traits.

Here's an example of how it could work with #[pg_export]ed functions (my only experience with pgx). I don't know if it would actually work, I don't know about every single aspect of PostgreSQL plugins:

  • Row structures (IndexedLetter in my example) could derive a PgRow trait
  • All types that can be returned implement a private (or public and unsafe) PgReturnable trait (different from IntoDatum, because it would also be implemented on iterators of PgRow)
  • This trait has a ::dump_type() method that takes no arguments and can dump the PostgreSQL type that corresponds to the function's return type (as raw SQL, or as an intermediate representation)
  • When compiling, 2 passes are made:
    • The extension is first compiled without being linked to the PostgreSQL API, as a binary and ran, to obtain the various SQL types and exported function. Dumped type data is stored in a file.
    • The extension is compiled and linked normally, and macros now have access to type-level information.

Anyways, it's not an easy problem, and I won't mind if this issue is closed since my personal problem is "solved" with my crate. I haven't dived in your codebase yet, but I'd happily try to contribute code if a solution is decided. And I'm still available if someone wants to continue discussing about this issue.

@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Feb 23, 2022

#[pg_extern] doesn't change the signature of the function to which it's attached. It makes a new extern "C" function that is exported by the .so, which calls the original function, does the Datum conversions and such, and ultimately that's the function linked up by the CREATE FUNCTION statement.

I haven't considered using the trait system for what are now #[pg_extern] functions, but I'm not really seeing anything that'd be quite as easy for the programmer.

If pgx had its own "expand_struct_to_tuple!()" macro, then I'm sure we could do the expansion ourselves -- probably not super smart tho.

Maybe we invent a #[derive] macro of sorts that you'd apply to your type, and we'd generate the proper From trait implementation to transparently convert it into a tuple behind the scenes.

// in `pgx` itself
pub trait PostgresRowType;

#[derive(PostgresRowType)]
struct IndexedLetter {
    index: i32,
    letter: &'static str,
}

// generates this...

impl PostgresRowType for IndexedLetter;
impl From<IndexedLetter> for (i32, &'static str) {
    fn from(v: IndexedLetter) -> Self {
        (v.index, v.letter)
    }
}

Then I think we'd have enough type information to know that...

#[pg_extern]
fn alphabet(length: i8) -> impl Iterator<Item = IndexedLetter> {
    ALPHABET
        .chars()
        .take(length.clamp(0, 25) as usize)
        .enumerate()
        .map(|(idx, letter)| IndexedLetter {
            idx: idx as _,
            letter,
        })
}

... is returning something we then need to call .into() for each item returned from the iterator.

Need to noodle it more.


Alternatively, pgx could support Postgres composite types that are mapped to Rust structs, and then you can deal with the named attributes from the SQL.

@edgarogh
Copy link
Author

That's a great idea, and probably the simplest ! However, how do you find about columns and their types to generate the setof table(...) in SQL ? Or maybe it is possible to have PostgresRowType generate some kind of type alias that is used as setof row_type ? And in the mean time a static linker item can be emitted so compiler fails if two types have the same name ?

doesn't change the signature of the function to which it's attached

My bad, I was almost certain that a pub got removed at some point because my functions didn't appear in my Rustdoc but it must have been my fault.

@Hoverbear
Copy link
Contributor

Hoverbear commented Feb 23, 2022

You can use From::from for this:

#[derive(Default)]
struct MyTypeWithManyFields {
    col_a: i32,
    col_b: i32,
    col_c: i32,
    col_d: i32,
    col_e: i32,
    col_f: i32,
    col_g: i32,
}

impl Into<(i32,i32,i32,i32,i32,i32,i32)> for MyTypeWithManyFields {
    fn into(self) -> (i32,i32,i32,i32,i32,i32,i32) {
        (
            self.col_a,
            self.col_b,
            self.col_c,
            self.col_d,
            self.col_e,
            self.col_f,
            self.col_g,
        )
    }
}

#[pg_extern]
fn many_columns() -> impl Iterator<Item = (
    name!(col_a, i32),
    name!(col_b, i32),
    name!(col_c, i32),
    name!(col_d, i32),
    name!(col_e, i32),
    name!(col_f, i32),
    name!(col_g, i32),
)> {
    (0..=10).map(|_| MyTypeWithManyFields::default()).map(Into::into)
}

Ideally we'd support something like:

type MyManyColumns = (
    name!(col_a, i32),
    name!(col_b, i32),
    name!(col_c, i32),
    name!(col_d, i32),
    name!(col_e, i32),
    name!(col_f, i32),
    name!(col_g, i32),
);

#[derive(Default)]
struct MyTypeWithManyFields {
    col_a: i32,
    col_b: i32,
    col_c: i32,
    col_d: i32,
    col_e: i32,
    col_f: i32,
    col_g: i32,
}

impl Into<MyManyColumns> for MyTypeWithManyFields {
    fn into(self) -> MyManyColumns {
        (
            self.col_a,
            self.col_b,
            self.col_c,
            self.col_d,
            self.col_e,
            self.col_f,
            self.col_g,
        )
    }
}

#[pg_extern]
fn many_columns() -> impl Iterator<Item = MyManyColumns> {
    (0..=10).map(|_| MyTypeWithManyFields::default()).map(Into::into)
}

However our proc macros aren't really sophisticated enough to handle two parts of this:

  1. name!() isn't feasible to work in this way. I don't see a solution to this beyond supporting like
     fn beep() -> impl Iterator<Item = (("col_name", i32))>
    If we even can support this using something like this... Const generics may sorta allow it.
  2. We currently do some processing of the return type in #[pg_extern], virtually all of that code would need to be rewritten to make it not do such, then we could possibly do some trait trickery.

I think we could solve the second issue, but name support might require some thinking (such as #[pg_extern(column_names = ["a", "b"])]...

@eeeebbbbrrrr
Copy link
Contributor

My bad, I was almost certain that a pub got removed at some point because my functions didn't appear in my Rustdoc but it must have been my fault.

hmm. We shouldn't be doing that. That would be kinda mean!

@workingjubilee workingjubilee added the enhancement New feature or request label Aug 12, 2022
@workingjubilee
Copy link
Member

workingjubilee commented Sep 28, 2022

After the redesign in #615, we are now capable of using traits to generate SQL. We should start thinking about approaches to this or in general reassessing the design here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request revisit-for:0.6
Projects
None yet
Development

No branches or pull requests

4 participants