Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Implement Postgres Interface #1914

Open
ricochet opened this issue Apr 16, 2024 · 8 comments
Open

[RFC] Implement Postgres Interface #1914

ricochet opened this issue Apr 16, 2024 · 8 comments
Labels
RFC rfc-proposed This issue is a newly proposed RFC and is awaiting comments, questions, and approval
Milestone

Comments

@ricochet
Copy link
Contributor

RFC Title

RFC: Create a Postgres specific interface

Introduction

Prior to 1.0, wasmCloud supported a wasmCloud-sqldb contract. Very similar to wasmCloud-sqldb, there is a proposal for wasi:sql world in the WASI space. Both API's have the goal of being generic across SQL providers, e.g. postgres and MySQL may use the same interface. The wasi:sql proposal has stalled mostly due to a lack of a desire by key stakeholders to implement a SQL interface that does not support semantic types for a given DB.

Motivation

We need a solution for our users to connect to a relational database.

Postgres is one of the most popular DB's (according to StackOverflow surveys, the most popular). Instead of proposing to build interfaces and design those for each major SQL flavor, e.g. postgres, sqllite, and MySQL, I am proposing that we build and implement one SQL flavor (postgres) fully before approaching others. This pattern should be readily replicatable with key differences being the WIT representation of types.

Detailed Design

Instead of using a resource to model a connection, instead use an interface that can later be "resourcified" once necessary CM features like WIT templates are available for declarative and statically verifiable multi-db connections. The types are elided in this example, but this package will define postgres:types that are used in the query and execute functions, e.g. a Result type.

package wasmcloud:postgres {
  interface connection {
    query: func(...) -> ...;
    execute: func(...) -> ...;
  }

Backwards Compatibility

New API so no backwards compatibility

Alternatives Considered

  • Use wasi:sql. This proposal is in the earliest stage in WASI. There are major open questions about its viability concerning the generic approach used in wasi-sql. While many of the "wasi-cloud" API's are targeted for the 80% use-case, SQL is that one tricky area where if the solution is to "just use a string", the client implementations suffer from lossy casting to loosing type semantics.
  • Build sql clients in a set of languages based on wasi-sockets. Add wasmCloud support for sandboxing wasi-sockets with deny-by-default firewalling. Expose a capability-driven link for the socket handles, e.g. we want to trigger policy checks for DNS lookups, etc. This is a large amount of work that is not needed with a specific interface as proposed above. My opinion is that it is much more valuable for the ecosystem to support semantic and static declaration of needed resources like DB connections and lower-level API's like wasi-sockets should be used only when a higher-level capability-driven API would not fit. I expect and hope that we will see a WASI version of the various SQL lingua francas and we will adopt those within wasmCloud as the proposals evolve.

Unresolved Questions

Should we implement this proposal as a custom interface (in order to demonstrate and exercise this path) or something first-party within wasmCloud?

Conclusion

This RFC proposes a wasmCloud interface for postgres that will expose postgres specific types and a reference provider implementation for postgres that could be used with other providers that support the postgres SQL flavor.

@ricochet ricochet added RFC rfc-proposed This issue is a newly proposed RFC and is awaiting comments, questions, and approval labels Apr 16, 2024
@ricochet
Copy link
Contributor Author

ricochet commented Apr 30, 2024

Some of the questions I have so far:

  • Which rust library should we provide for initial tier 1 support in wasmCloud? Preference towards sqlx as I have a strong preference to use sqlx for golang support. Alternatively we could support tokio-postgres
  • Instead of connections as the top-level interface name, would sessions be more apt? I am looking to mirror the data types from the postgres HTTP API.

@vados-cosmonic
Copy link
Contributor

Hmnn, those are some good questions -- I have some thoughts:

Which rust library should we provide for initial tier 1 support in wasmCloud? Preference towards sqlx as I have a strong preference to use sqlx for golang support. Alternatively we could support tokio-postgres

I think sqlx is the more used/popular library in the ecosystem and has a bit more community and flexibility. tokio-postgres is quite low level (which is appropriate for an optimized postgres provider), but sqlx has more of the workflows and ergonomics that I think most people expect from a DB toolkit these days.

I wasn't aware that sqlx in Golang was related but if it helps Golang support, even better.

Instead of connections as the top-level interface name, would sessions be more apt? I am looking to mirror the data types from the postgres HTTP API.

Hmnn I wonder if neither connections NOR sessions make sense for an interface which has query() and execute() -- I personally think that having query and execute as top level interfaces is a bit more descriptive (ex. when looking at the imports of some component and trying to figure out what it does), and connection/session establishment and management should be under another interface.

So then a component might have a WIT like this:

package example:sql-component;

world component {
  import wasmcloud:postgres/query;
  export wasi:cli/run;
}

Clearly, this component does some query and is a run()-able component -- I think it might be a little more opaque if the import was wasmcloud:postgres/connection or wasmcloud:postgres/sessions.

This brings up a usability point -- depending on how query is written, the connection/session in question could actually be optional (and left up to the provider). Technically doing a query and establishing the connection do not both need to be done by a component (though we might expect them to be), so might be worth thinking about that as well.

The most straight forward way to implement an interface like query is to pass along some sort of connection/session ID with every call in a resource-less world of course, but technically we could have some sort of set-active-session(), or with-session() interface that combines with other things.

@thomastaylor312
Copy link
Contributor

I think sqlx is the more used/popular library in the ecosystem and has a bit more community and flexibility. tokio-postgres is quite low level (which is appropriate for an optimized postgres provider), but sqlx has more of the workflows and ergonomics that I think most people expect from a DB toolkit these days.

If this is going to be postgres specific, I think on reflection I'd prefer the thing that lets us optimize for postgres (tokio-postgres). But if we're just going for super high level API, then sqlx is ok for me. I assume @ricochet you're aiming for the latter?

@ricochet
Copy link
Contributor Author

The key reason why I want a representation for a connection/session is so that we have the ability to run optimizations like creating a connection pool as part of the implementation. Queries and exec's belong to the same session.

The most important aspect of a WIT interface is that it supports a declarative definition of necessary resources. Once we have WIT templates, e.g. the ability to specify * : connection then is declaratively possible for multiple connections.

If this were instead represented with a resource, this would imply that these connections may be created and closed at runtime by the component. I specifically do not want to enable this case with wasi 0.2 semantics. The DX is better served by optimizing for the most common case, one connection per component. Components should do one thing well. Composed components may have sub-components that also have their own connections so this design does not preculde multiple-connections per composed component.

The key precedent that we have to build on is wasi:keyvalue/store. Both a connection and store interface have opportunities for host level optimization like an initial authentication handshake.

@vados-cosmonic
Copy link
Contributor

@thomastaylor312 sure I can go either way here -- what I think we give up is more people feeling comfortable contributing, a slightly more alive dependency with better documentation, but we do get query pipelining and some more low level driver performance that is absolutely dwarfed by inefficient queries people will almost certainly write.

It's always possible to switch the underlying library (we can always introduce driver-sqlx vs driver-tokio-postgres feature flags), so this isn't too critical of an upfront design decision for the interface.

Let's go with tokio-postgres for now and introduce another option if it makes sense!

@ricochet

The most important aspect of a WIT interface is that it supports a declarative definition of necessary resources. Once we have WIT templates, e.g. the ability to specify * : connection then is declaratively possible for multiple connections.

With regards to the templating stuff, it'd be great to go after that and rev the contract to update appropriately, I think.

The key reason why I want a representation for a connection/session is so that we have the ability to run optimizations like creating a connection pool as part of the implementation. Queries and exec's belong to the same session.

If this were instead represented with a resource, this would imply that these connections may be created and closed at runtime by the component. I specifically do not want to enable this case with wasi 0.2 semantics. The DX is better served by optimizing for the most common case, one connection per component. Components should do one thing well. Composed components may have sub-components that also have their own connections so this design does not preculde multiple-connections per composed component.

Sorry taken together these are quite confusing, would you mind sketching out a WIT for the interface you're envisioning here? It sounds like you want a connection/session but you don't want them to be created/closed at runtime by the component?

Is the idea that the provider creates a connection when a query comes in? The case of a single component wanting to connect to two databases seems very common, and is something we'd want to support, I can't think of how to do that without using some sort of representation of a connection.

Somewhat as an aside -- If I'm understanding correctly, it's fairly simple to put this under different interfaces -- one exposing more manual access (i.e. including connection management) and one for provider-managed connections?

An example WIT of the interface you're envisioning here would really go a long way to make this more concrete!

To rehash where I think the decision points are:

  • Should components be able to create connections and/or sessions? (I think yes)
  • Should a query() function require a connection/session? (I think this arg could be optional, if it's 100% provider managed)
  • Should connection management and querying/executing be under the same interface? (I think no)

@vados-cosmonic
Copy link
Contributor

vados-cosmonic commented May 2, 2024

To illustrate how I'm thinking about this and to give us something concrete to discuss, I'm thinking of a layout like this:

package wasmcloud:postgres;

interface connection {
  record connection-create-opts{
    /// host to connect to
    host: string, 
    // ...
  }

  /// Essentially a token ID, could be a resource if we want
  type connection-token = string;

  /// things that can go wrong when getting a connection
  variant create-connection error { ... }

  variant connection-options {
   /// Default completely managed connection
    managed,
    /// An existing named profile
    named-profile(string),
    /// Near-completely dynamic connection creation
    ///
    /// NOTE: components probably won't need this, but more dynamic setups might, we can also disable this kind by default at the provider level
    override-params(connection-params),
  }

  /// Create a connection
  create-connection: func(opts: connection-create-opts) -> result<connection-token, create-connection-error>
}

interface session {
  record session-create-opts {
    /// Timeout all queries at the session level
    query-timeout-ms: option<u32>;
    // ....
  }
  
  /// Represents a session
  type session-token = string;

  create-session: func(conn: connection-token, opts: session-create-opts) -> result<session-token, ...>;
}

interface query {
  query: func(token: connection-token, query: ???, params: ???) -> result<???>;
}

interface execute {
  execute: func(token: connection-token, query: ???, params: ???) -> result<???>;
}

Could be resource-driven, could be not, but as you can see the real important DX points are how connection management is done and how the interfaces are split up (if they should be at all)

@thomastaylor312
Copy link
Contributor

thomastaylor312 commented May 2, 2024

I think I lean more towards what @ricochet is saying here. I don’t create or destroy connections. It is more efficient for the implementer, in this case our providers, to handle that connection so that it can be efficient with pooling those connections.

I don’t think any of the functions should require or need some sort of handle parameter in the future. We can handle more options like @ricochet stated.

Also, I don’t quite understand the rationale for splitting up the interfaces for execution and query.

@vados-cosmonic
Copy link
Contributor

Thanks for weighing in! I don't think components requesting connections precludes pooling on the backend -- I'm not sure where that assumption comes from...

I think the important question here is, should components have any control over the connection they're given? If the answer is no then that's certainly an up front design choice which I'm glad we sorted now. This would mean AFAIK know that control over an individual connection must be done via config/link-config, which is certainly also OK!

Also, I don’t quite understand the rationale for splitting up the interfaces for execution and query.

Well I'm sort of mirroring the decision to split query and exec in the first place -- this is a common thing in client libs/SDKs and is in the WIT from earlier in here.

In Postgres at least exec usually refers to executing a prepared statement, rather than a query. So the exec interface would contain machinery related to prepared statements. So maybe the better name for that interface might be prepared-stmt or something like that, that's a great point.

@brooksmtownsend brooksmtownsend added this to the wasmCloud 1.1 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC rfc-proposed This issue is a newly proposed RFC and is awaiting comments, questions, and approval
Projects
Status: In Progress
Development

No branches or pull requests

4 participants