enhancement(observability): GraphQL client#3778
Conversation
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
| default-run = "vector" | ||
|
|
||
| [[bin]] | ||
| name = "graphql-schema" |
There was a problem hiding this comment.
Introduced this for running schema generation via Cargo:
cargo run --bin graphql-schemaI looked for an existing pattern prior to doing this, to see if there are other entrypoints / helper bins running elsewhere, which didn't yield anything obvious. It doesn't really fit a workspace candidate for lib -- it's not really a library, nor does it provide an obvious way to call back into parent deps.
If there is a preferred approach to running helpers that aren't part of the main vector bin (which might include where to keep the source), would appreciate the pointer.
There was a problem hiding this comment.
Do you intend for this to be a build time thing? It could be part of a build script and run on changed with https://doc.rust-lang.org/cargo/reference/build-scripts.html ?
| @@ -0,0 +1,3 @@ | |||
| query HealthQuery { | |||
There was a problem hiding this comment.
All queries (and mutations/subscriptions) are placed in .graphql files, which are validated at compile time and embedded in the bin as strings.
| use vector::api::schema::build_schema; | ||
|
|
||
| static INTROSPECTION_QUERY: &str = r#" | ||
| query IntrospectionQuery { |
There was a problem hiding this comment.
This is the full introspection query that retrieves all parts of the GraphQL schema, including a few of the features we don't currently use (like directives) but may in future.
| #[tokio::main] | ||
| async fn main() { | ||
| let schema = build_schema().finish(); | ||
| let res = schema.execute(INTROSPECTION_QUERY).await; |
There was a problem hiding this comment.
This executes the introspection query against the live schema, outside of the Warp instantiation. This is interesting IMO, because it shows how we could bypass HTTP altogether and add a middle tier for marshalling requests via essentially any protocol. HTTP isn't a hard requirement!
There was a problem hiding this comment.
I wonder if this could be used by things like our wasm engine. 🤔
There was a problem hiding this comment.
graphql-client is WASM-compatible; all queries, subscriptions, (de)serialisation of payloads etc should work.
I used tokio-tungstenite as the WebSocket client and async-graphql as the GraphQL server. I'm not sure if/how we'd intend to use these in wasm, but there's probably also the option of leveraging the native browser APIs for WebSockets.
Let me know if you have any specific use-cases in mind for leveraging GraphQL with WASM-- interesting!
There was a problem hiding this comment.
I definitely want to expose a query(some_graphql_string) via the hostcall API. :) I'll do that later though!
|
|
||
| let json = serde_json::to_string_pretty(&async_graphql::http::GQLResponse(res)).unwrap(); | ||
|
|
||
| fs::write("graphql/schema.json", json).expect("Couldn't save schema file"); |
There was a problem hiding this comment.
Stores the schema at graphql/schema.json, based on the cwd. I'm expecting this would typically be invoked via cargo run and therefore relative to Cargo.toml. I'm also assuming that the graphql dir actually exists, which it should by default (since this file is checked in.)
If we ever want to deploy this separately (say, in a Docker image) or handle cases where the dir structure might not match, then a mkdir -p approach + CLI opts would be an obvious upgrade path. Kept this simple for now.
There was a problem hiding this comment.
I've taken this out of draft - it should be good to review again.
The main change vs. the last time is the handling of streaming GraphQL subscriptions. I added a SubscriptionClient that aims to work in a similar fashion to the GraphQL client libraries in Javascript, such as Apollo's subscription-transport-ws.
Some notes on how this is specifically implemented follow:
WebSockets
GraphQL subscriptions are typically transported over WebSockets. This allows them to interface natively with modern browsers, and provides bi-directional comms -- required for the client to issue subscription requests, and receive arbitrary payloads.
Multiplexed subscriptions
The WebSocket connection is intended to be long-lived. It can handle multiple disparate GraphQL subscriptions. Control messages and payloads are delivered over a single connection. An { id: Uuid } is attached to each subscription { type: "start" } request (in this client; marshals to a String), and the same ID is included in payloads. This enables the client to determine how to route payloads.
WeakValueHashMap + broadcast channels
I use weak_table's WeakValueHashMap to store subscriptions against the client. The caller is intended to control the lifetime of the subscription; when it goes out of scope, the subscription is removed from the client.
As payloads are received, their { id } is checked and routed to the appropriate Subscription, mapped against their Uuid.
Implicit dropping
When SubscriptionClient goes out of scope, the connection is terminated. The GraphQL server should cancel all active subscriptions.
When an individual Subscription goes out of scope, its Drop implementation sends an { id, type: "stop" } message to the GraphQL client, which should terminate that one subscription and keep others running.
@sghall, tagging you on this too, as both a sanity check against the equivalent implementation in Urql (making sure control messages, etc are right-- I'll do another sweep myself too) since I know we've both been in the weeds with the WS network tab in Dev Tools... and for general edification for how this PR generates queries/subscriptions, similar to GraphQL Code Generator in TS.
I imagine you'll want to play around generating queries yourself once we're at that point in the UI, so just mostly just a heads-up on this initial approach.
Signed-off-by: Lee Benson <lee@leebenson.com>
|
A note to reviewers: I'll be adding tests today for:
This is still good to review, but FYI there will be additional tests to look at shortly. |
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
|
I've implemented all the tests that I think are relevant to the scope of this PR, including the various health checks, heartbeats, uptime metrics, and events processed. @Hoverbear, @JeanMertz -- this wound up being a sizeable PR conceptually, as well as LOC, so I know there's a fair chunk to review here. I appreciate you taking the time to look at this. Further evolution of the schema should hopefully happen in smaller steps, now that the API server + client are out of the way. If you have any questions or want me to go into further detail on any of this, happy to walk through on git/chat/zoom/loom as preferred. Thanks! |
Signed-off-by: Lee Benson <lee@leebenson.com>
|
@leebenson taking a look at this PR. Really great stuff here! Just wanted to note that I can occasionally get the api tests to fail locally on my mac. I tried my best to figure it out, but to no avail 😞 I'm logging out that "seconds: 0" you see in the output (when it passes seconds == 3). That's in the stevenhall@steves-macbook vector % cargo test --test api -- --nocapture
Finished test [unoptimized + debuginfo] target(s) in 0.37s
Running target/debug/deps/api-491a23c67c646c19
running 8 tests
test tests::api_playground_disabled ... ok
test tests::api_playground_enabled ... ok
test tests::api_health ... ok
test tests::api_graphql_health ... ok
test tests::api_graphql_event_processed_metrics ... ok
test tests::api_graphql_heartbeat ... ok
seconds: 0
thread 'tests::api_graphql_combined_heartbeat_uptime' panicked at 'assertion failed: seconds > num_results as f64 - 1.0', tests/api.rs:224:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test tests::api_graphql_combined_heartbeat_uptime ... FAILED
seconds: 0
thread 'tests::api_graphql_uptime_metrics' panicked at 'assertion failed: seconds > num_results as f64 - 1.0', tests/api.rs:224:9
test tests::api_graphql_uptime_metrics ... FAILED
failures:
failures:
tests::api_graphql_combined_heartbeat_uptime
tests::api_graphql_uptime_metrics
test result: FAILED. 6 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out |
|
Thanks for checking it out, and surfacing those tests @sghall. Looks like it's 'uptime' in both of those tests. Each test initialises metrics with: // Initialize the metrics system. Idempotent.
fn init_metrics() {
METRIC_INIT.call_once(|| {
let _ = vector::metrics::init();
// Spawn the internal 'heartbeat' event, which triggers uptime
tokio::spawn(heartbeat::heartbeat());
})
}With /// Emits Heartbeat event every second.
pub async fn heartbeat() {
let since = Instant::now();
let mut interval = interval(Duration::from_secs(1));
loop {
interval.tick().await;
emit!(Heartbeat { since });
}
}It could be that the GraphQL subscription query returns '0' on its first run and '0' on the second, firing faster than the interval is able to emit. I'll update the streaming interval to be 1,200ms for GraphQL to cover that possibility. Thanks for catching! |
* wip Signed-off-by: Lee Benson <lee@leebenson.com> * API v2 Signed-off-by: Lee Benson <lee@leebenson.com> * -println Signed-off-by: Lee Benson <lee@leebenson.com> * happy clippy Signed-off-by: Lee Benson <lee@leebenson.com> * copyable playground url Signed-off-by: Lee Benson <lee@leebenson.com> * loop Signed-off-by: Lee Benson <lee@leebenson.com> * ohNo * fix conflicts Signed-off-by: Lee Benson <lee@leebenson.com> * fix config tests Signed-off-by: Lee Benson <lee@leebenson.com> * stop breaking things Signed-off-by: Lee Benson <lee@leebenson.com> * separate metrics Signed-off-by: Lee Benson <lee@leebenson.com> * 3x metrics Signed-off-by: Lee Benson <lee@leebenson.com> * health and playground tests Signed-off-by: Lee Benson <lee@leebenson.com> * graphql client; health test Signed-off-by: Lee Benson <lee@leebenson.com> * start server Signed-off-by: Lee Benson <lee@leebenson.com> * required-features Signed-off-by: Lee Benson <lee@leebenson.com> * Update Cargo.lock * server shutdown Signed-off-by: Lee Benson <lee@leebenson.com> * fix API server scope Signed-off-by: Lee Benson <lee@leebenson.com> * drop(server) Signed-off-by: Lee Benson <lee@leebenson.com> * fix events/mod * minor cleanup Signed-off-by: Lee Benson <lee@leebenson.com> * lint Signed-off-by: Lee Benson <lee@leebenson.com> * wip * heartbeat Signed-off-by: Lee Benson <lee@leebenson.com> * retry_until * check-internal-events Signed-off-by: Lee Benson <lee@leebenson.com> * working client * check-fmt Signed-off-by: Lee Benson <lee@leebenson.com> * heartbeat assertions Signed-off-by: Lee Benson <lee@leebenson.com> * gen newline * drop moot helper * close -> stop Signed-off-by: Lee Benson <lee@leebenson.com> * uptime metrics Signed-off-by: Lee Benson <lee@leebenson.com> * events processed test Signed-off-by: Lee Benson <lee@leebenson.com> * parallel subscriptions Signed-off-by: Lee Benson <lee@leebenson.com> * minor use Signed-off-by: Lee Benson <lee@leebenson.com> * extra commentary Signed-off-by: Lee Benson <lee@leebenson.com> * fix tests Signed-off-by: Lee Benson <lee@leebenson.com> Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Adds graphql-client, for typed GraphQL queries against the API server implemented in #3708. Currently targets that same PR for visibility, and is a draft for now.
It comprises the following elements:
Schema generation
#3708 introduces async-graphql as a code-first implementation of GraphQL schema.
Since graphql-client validates queries at compile-time, it requires a reference to the schema file (either GraphQL SDL, or the JSON equivalent). Async-graphql doesn't generate this out of the box.
I've added a generator in
src/api/schema/gen.rs, which builds a runtime version of the schema, issues an introspection query against it, and serializes the response to JSON. It can be executed with:The resulting file will be saved to
graphql/schema.json, relative to the Cargo.toml root.This file is intended to be checked in. In Alloy, we had a CI test to ensure the checked in version matches the latest schema, by doing a git diff. I think it'd make sense to add one here too, to guarantee all queries are up to date.
Querying
To build a query, we derive from graphql-client's
GraphQLQuerytype, and provide links to the GraphQL schema, and the individual.graphqlfile containing the query:In this example,
grapql/queries/health.graphqlcontains:The name of the struct must match the text label provided in the query. Multiple queries can optionally be combined into a single
.graphql(generally not relevant/useful), and the macro will expand the correct query.The validity of the query is asserted by matching against the schema. Any errors reflect during build, e.g if we change
health->healthz, we get:Executing a query / parsing responses
Graphql-client builds the query, variables and response types. The transport is userland. In
tests/api.rs, I'm using reqwest per:This:
HealthQuerystructreqwestto POST the request body to the/graphqlendpointhealth_query::ResponseData, containinghealth: true.Use cases
Currently, this is only used in API integration tests. My intention is for us to use this with
vector topto issue the same queries that the UI will be issuing (albeit with its own, separate code gen + client libs) and anywhere else we need to make compile-time guarantees against API data -- metrics and otherwise.