enhancement(observability): GraphQL client by leebenson · Pull Request #3778 · vectordotdev/vector

leebenson · 2020-09-09T13:56:37Z

Adds graphql-client, for typed GraphQL queries against the API server implemented in #3708. Currently targets that same PR for visibility, and is a draft for now.

It comprises the following elements:

Schema generation

#3708 introduces async-graphql as a code-first implementation of GraphQL schema.

Since graphql-client validates queries at compile-time, it requires a reference to the schema file (either GraphQL SDL, or the JSON equivalent). Async-graphql doesn't generate this out of the box.

I've added a generator in src/api/schema/gen.rs, which builds a runtime version of the schema, issues an introspection query against it, and serializes the response to JSON. It can be executed with:

cargo run --bin graphql-schema

The resulting file will be saved to graphql/schema.json, relative to the Cargo.toml root.

This file is intended to be checked in. In Alloy, we had a CI test to ensure the checked in version matches the latest schema, by doing a git diff. I think it'd make sense to add one here too, to guarantee all queries are up to date.

Querying

To build a query, we derive from graphql-client's GraphQLQuery type, and provide links to the GraphQL schema, and the individual .graphql file containing the query:

#[derive(GraphQLQuery)]
#[graphql(
    schema_path = "graphql/schema.json",
    query_path = "graphql/queries/health.graphql",
    response_derives = "Debug"
)]
struct HealthQuery;

In this example, grapql/queries/health.graphql contains:

query HealthQuery {
  health
}

The name of the struct must match the text label provided in the query. Multiple queries can optionally be combined into a single .graphql (generally not relevant/useful), and the macro will expand the correct query.

The validity of the query is asserted by matching against the schema. Any errors reflect during build, e.g if we change health -> healthz, we get:

error: proc-macro derive panicked
  --> tests/api.rs:16:14
   |
16 |     #[derive(GraphQLQuery)]
   |              ^^^^^^^^^^^^
   |
   = help: message: Code generation failed.
           could not find field `healthz`

Executing a query / parsing responses

Graphql-client builds the query, variables and response types. The transport is userland. In tests/api.rs, I'm using reqwest per:

async fn query<T: GraphQLQuery>(
    request_body: &graphql_client::QueryBody<T::Variables>,
) -> graphql_client::Response<T::ResponseData> {
    let config = api_enabled_config();
    let addr = config.api.bind.unwrap();
    let url = format!("http://{}:{}/graphql", addr.ip(), addr.port());

    start_api_server(addr, false);

    let client = reqwest::Client::new();

    retry_until(
        || client.post(&url).json(&request_body).send(),
        Duration::from_millis(100),
        Duration::from_secs(10),
    )
    .await
    .json()
    .await
    .unwrap()
}

#[tokio::test]
async fn api_graphql_health() {
    let request_body = HealthQuery::build_query(health_query::Variables);
    let res = query::<HealthQuery>(&request_body).await;

    assert_matches!(
        res,
        graphql_client::Response {
            data: Some(health_query::ResponseData { health: true }),
            errors: None,
        }
    );
}

This:

Builds the GraphQL query, from the generated HealthQuery struct
Spawns the API server
Uses reqwest to POST the request body to the /graphql endpoint
Validates against the typed health_query::ResponseData, containing health: true.

Use cases

Currently, this is only used in API integration tests. My intention is for us to use this with vector top to issue the same queries that the UI will be issuing (albeit with its own, separate code gen + client libs) and anywhere else we need to make compile-time guarantees against API data -- metrics and otherwise.

Signed-off-by: Lee Benson <lee@leebenson.com>

leebenson · 2020-09-09T14:55:23Z

+default-run = "vector"
+
+[[bin]]
+name = "graphql-schema"


Introduced this for running schema generation via Cargo:

cargo run --bin graphql-schema

I looked for an existing pattern prior to doing this, to see if there are other entrypoints / helper bins running elsewhere, which didn't yield anything obvious. It doesn't really fit a workspace candidate for lib -- it's not really a library, nor does it provide an obvious way to call back into parent deps.

If there is a preferred approach to running helpers that aren't part of the main vector bin (which might include where to keep the source), would appreciate the pointer.

Do you intend for this to be a build time thing? It could be part of a build script and run on changed with https://doc.rust-lang.org/cargo/reference/build-scripts.html ?

leebenson · 2020-09-09T14:56:42Z

@@ -0,0 +1,3 @@
+query HealthQuery {


All queries (and mutations/subscriptions) are placed in .graphql files, which are validated at compile time and embedded in the bin as strings.

leebenson · 2020-09-09T14:58:06Z

+use vector::api::schema::build_schema;
+
+static INTROSPECTION_QUERY: &str = r#"
+query IntrospectionQuery {


This is the full introspection query that retrieves all parts of the GraphQL schema, including a few of the features we don't currently use (like directives) but may in future.

leebenson · 2020-09-09T14:59:31Z

+#[tokio::main]
+async fn main() {
+    let schema = build_schema().finish();
+    let res = schema.execute(INTROSPECTION_QUERY).await;


This executes the introspection query against the live schema, outside of the Warp instantiation. This is interesting IMO, because it shows how we could bypass HTTP altogether and add a middle tier for marshalling requests via essentially any protocol. HTTP isn't a hard requirement!

I wonder if this could be used by things like our wasm engine. 🤔

graphql-client is WASM-compatible; all queries, subscriptions, (de)serialisation of payloads etc should work.

I used tokio-tungstenite as the WebSocket client and async-graphql as the GraphQL server. I'm not sure if/how we'd intend to use these in wasm, but there's probably also the option of leveraging the native browser APIs for WebSockets.

Let me know if you have any specific use-cases in mind for leveraging GraphQL with WASM-- interesting!

I definitely want to expose a query(some_graphql_string) via the hostcall API. :) I'll do that later though!

leebenson · 2020-09-09T15:03:21Z

+
+    let json = serde_json::to_string_pretty(&async_graphql::http::GQLResponse(res)).unwrap();
+
+    fs::write("graphql/schema.json", json).expect("Couldn't save schema file");


Stores the schema at graphql/schema.json, based on the cwd. I'm expecting this would typically be invoked via cargo run and therefore relative to Cargo.toml. I'm also assuming that the graphql dir actually exists, which it should by default (since this file is checked in.)

If we ever want to deploy this separately (say, in a Docker image) or handle cases where the dir structure might not match, then a mkdir -p approach + CLI opts would be an obvious upgrade path. Kept this simple for now.

leebenson

I've taken this out of draft - it should be good to review again.

The main change vs. the last time is the handling of streaming GraphQL subscriptions. I added a SubscriptionClient that aims to work in a similar fashion to the GraphQL client libraries in Javascript, such as Apollo's subscription-transport-ws.

Some notes on how this is specifically implemented follow:

WebSockets

GraphQL subscriptions are typically transported over WebSockets. This allows them to interface natively with modern browsers, and provides bi-directional comms -- required for the client to issue subscription requests, and receive arbitrary payloads.

Multiplexed subscriptions

The WebSocket connection is intended to be long-lived. It can handle multiple disparate GraphQL subscriptions. Control messages and payloads are delivered over a single connection. An { id: Uuid } is attached to each subscription { type: "start" } request (in this client; marshals to a String), and the same ID is included in payloads. This enables the client to determine how to route payloads.

WeakValueHashMap + broadcast channels

I use weak_table's WeakValueHashMap to store subscriptions against the client. The caller is intended to control the lifetime of the subscription; when it goes out of scope, the subscription is removed from the client.

As payloads are received, their { id } is checked and routed to the appropriate Subscription, mapped against their Uuid.

Implicit dropping

When SubscriptionClient goes out of scope, the connection is terminated. The GraphQL server should cancel all active subscriptions.

When an individual Subscription goes out of scope, its Drop implementation sends an { id, type: "stop" } message to the GraphQL client, which should terminate that one subscription and keep others running.

@sghall, tagging you on this too, as both a sanity check against the equivalent implementation in Urql (making sure control messages, etc are right-- I'll do another sweep myself too) since I know we've both been in the weeds with the WS network tab in Dev Tools... and for general edification for how this PR generates queries/subscriptions, similar to GraphQL Code Generator in TS.

I imagine you'll want to play around generating queries yourself once we're at that point in the UI, so just mostly just a heads-up on this initial approach.

Signed-off-by: Lee Benson <lee@leebenson.com>

leebenson · 2020-09-21T06:58:32Z

A note to reviewers: I'll be adding tests today for:

Events processed
Uptime
Multiple requests in flight in parallel

This is still good to review, but FYI there will be additional tests to look at shortly.

Signed-off-by: Lee Benson <lee@leebenson.com>

leebenson · 2020-09-22T18:51:04Z

I've implemented all the tests that I think are relevant to the scope of this PR, including the various health checks, heartbeats, uptime metrics, and events processed.

@Hoverbear, @JeanMertz -- this wound up being a sizeable PR conceptually, as well as LOC, so I know there's a fair chunk to review here. I appreciate you taking the time to look at this.

Further evolution of the schema should hopefully happen in smaller steps, now that the API server + client are out of the way.

If you have any questions or want me to go into further detail on any of this, happy to walk through on git/chat/zoom/loom as preferred. Thanks!

Signed-off-by: Lee Benson <lee@leebenson.com>

sghall · 2020-09-24T22:32:05Z

@leebenson taking a look at this PR. Really great stuff here!

Just wanted to note that I can occasionally get the api tests to fail locally on my mac. I tried my best to figure it out, but to no avail 😞 I'm logging out that "seconds: 0" you see in the output (when it passes seconds == 3). That's in the new_uptime_subscription function in the api tests. Seems really random. Have to run it ~5-10 times to get it to fail. It's always these same two tests.

stevenhall@steves-macbook vector % cargo test --test api -- --nocapture
    Finished test [unoptimized + debuginfo] target(s) in 0.37s
     Running target/debug/deps/api-491a23c67c646c19

running 8 tests
test tests::api_playground_disabled ... ok
test tests::api_playground_enabled ... ok
test tests::api_health ... ok
test tests::api_graphql_health ... ok
test tests::api_graphql_event_processed_metrics ... ok
test tests::api_graphql_heartbeat ... ok
seconds: 0
thread 'tests::api_graphql_combined_heartbeat_uptime' panicked at 'assertion failed: seconds > num_results as f64 - 1.0', tests/api.rs:224:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test tests::api_graphql_combined_heartbeat_uptime ... FAILED
seconds: 0
thread 'tests::api_graphql_uptime_metrics' panicked at 'assertion failed: seconds > num_results as f64 - 1.0', tests/api.rs:224:9
test tests::api_graphql_uptime_metrics ... FAILED

failures:

failures:
    tests::api_graphql_combined_heartbeat_uptime
    tests::api_graphql_uptime_metrics

test result: FAILED. 6 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out

leebenson · 2020-09-25T05:55:45Z

Thanks for checking it out, and surfacing those tests @sghall. Looks like it's 'uptime' in both of those tests. Each test initialises metrics with:

// Initialize the metrics system. Idempotent.
fn init_metrics() {
    METRIC_INIT.call_once(|| {
        let _ = vector::metrics::init();

        // Spawn the internal 'heartbeat' event, which triggers uptime
        tokio::spawn(heartbeat::heartbeat());
    })
}

With heartbeat::heartbeat() being:

/// Emits Heartbeat event every second.
pub async fn heartbeat() {
    let since = Instant::now();
    let mut interval = interval(Duration::from_secs(1));
    loop {
        interval.tick().await;
        emit!(Heartbeat { since });
    }
}

It could be that the GraphQL subscription query returns '0' on its first run and '0' on the second, firing faster than the interval is able to emit.

I'll update the streaming interval to be 1,200ms for GraphQL to cover that possibility.

Thanks for catching!

Signed-off-by: Lee Benson <lee@leebenson.com>

* wip Signed-off-by: Lee Benson <lee@leebenson.com> * API v2 Signed-off-by: Lee Benson <lee@leebenson.com> * -println Signed-off-by: Lee Benson <lee@leebenson.com> * happy clippy Signed-off-by: Lee Benson <lee@leebenson.com> * copyable playground url Signed-off-by: Lee Benson <lee@leebenson.com> * loop Signed-off-by: Lee Benson <lee@leebenson.com> * ohNo * fix conflicts Signed-off-by: Lee Benson <lee@leebenson.com> * fix config tests Signed-off-by: Lee Benson <lee@leebenson.com> * stop breaking things Signed-off-by: Lee Benson <lee@leebenson.com> * separate metrics Signed-off-by: Lee Benson <lee@leebenson.com> * 3x metrics Signed-off-by: Lee Benson <lee@leebenson.com> * health and playground tests Signed-off-by: Lee Benson <lee@leebenson.com> * graphql client; health test Signed-off-by: Lee Benson <lee@leebenson.com> * start server Signed-off-by: Lee Benson <lee@leebenson.com> * required-features Signed-off-by: Lee Benson <lee@leebenson.com> * Update Cargo.lock * server shutdown Signed-off-by: Lee Benson <lee@leebenson.com> * fix API server scope Signed-off-by: Lee Benson <lee@leebenson.com> * drop(server) Signed-off-by: Lee Benson <lee@leebenson.com> * fix events/mod * minor cleanup Signed-off-by: Lee Benson <lee@leebenson.com> * lint Signed-off-by: Lee Benson <lee@leebenson.com> * wip * heartbeat Signed-off-by: Lee Benson <lee@leebenson.com> * retry_until * check-internal-events Signed-off-by: Lee Benson <lee@leebenson.com> * working client * check-fmt Signed-off-by: Lee Benson <lee@leebenson.com> * heartbeat assertions Signed-off-by: Lee Benson <lee@leebenson.com> * gen newline * drop moot helper * close -> stop Signed-off-by: Lee Benson <lee@leebenson.com> * uptime metrics Signed-off-by: Lee Benson <lee@leebenson.com> * events processed test Signed-off-by: Lee Benson <lee@leebenson.com> * parallel subscriptions Signed-off-by: Lee Benson <lee@leebenson.com> * minor use Signed-off-by: Lee Benson <lee@leebenson.com> * extra commentary Signed-off-by: Lee Benson <lee@leebenson.com> * fix tests Signed-off-by: Lee Benson <lee@leebenson.com> Signed-off-by: Brian Menges <brian.menges@anaplan.com>

leebenson added 17 commits September 4, 2020 15:04

wip

214cf2b

Signed-off-by: Lee Benson <lee@leebenson.com>

API v2

0e4cc9d

Signed-off-by: Lee Benson <lee@leebenson.com>

-println

2b5bd32

Signed-off-by: Lee Benson <lee@leebenson.com>

Merge branch 'master' into leebenson/graphqlv2

aa7dd84

happy clippy

616dd01

Signed-off-by: Lee Benson <lee@leebenson.com>

copyable playground url

5860bc7

Signed-off-by: Lee Benson <lee@leebenson.com>

loop

6e41e1d

Signed-off-by: Lee Benson <lee@leebenson.com>

ohNo

d70a5d9

Merge branch 'master' into leebenson/graphqlv2

0b217c2

fix conflicts

8e40cd7

Signed-off-by: Lee Benson <lee@leebenson.com>

fix config tests

04e6fe8

Signed-off-by: Lee Benson <lee@leebenson.com>

stop breaking things

868cb3c

Signed-off-by: Lee Benson <lee@leebenson.com>

separate metrics

bb4cec2

Signed-off-by: Lee Benson <lee@leebenson.com>

3x metrics

997f21a

Signed-off-by: Lee Benson <lee@leebenson.com>

health and playground tests

5fe3c13

Signed-off-by: Lee Benson <lee@leebenson.com>

graphql client; health test

2c5dc60

Signed-off-by: Lee Benson <lee@leebenson.com>

start server

00409d3

Signed-off-by: Lee Benson <lee@leebenson.com>

leebenson added the domain: observability Anything related to monitoring/observing Vector label Sep 9, 2020

leebenson added this to the 2020-08-31 - Digitization Laser milestone Sep 9, 2020

leebenson self-assigned this Sep 9, 2020

leebenson requested review from JeanMertz, bruceg, jamtur01 and lukesteensen September 9, 2020 13:56

required-features

cdf0d55

Signed-off-by: Lee Benson <lee@leebenson.com>

leebenson commented Sep 9, 2020

View reviewed changes

leebenson commented Sep 18, 2020

View reviewed changes

leebenson requested a review from sghall September 18, 2020 13:52

leebenson added 2 commits September 18, 2020 14:55

drop moot helper

1e25973

close -> stop

9db04bf

Signed-off-by: Lee Benson <lee@leebenson.com>

leebenson added 5 commits September 21, 2020 12:36

Merge branch 'master' into leebenson/graphql-client

74ee24c

uptime metrics

1d88911

Signed-off-by: Lee Benson <lee@leebenson.com>

events processed test

7ece958

Signed-off-by: Lee Benson <lee@leebenson.com>

parallel subscriptions

c3ab13a

Signed-off-by: Lee Benson <lee@leebenson.com>

Merge branch 'master' into leebenson/graphql-client

6acd45e

leebenson added 2 commits September 22, 2020 19:53

minor use

2cbab3f

Signed-off-by: Lee Benson <lee@leebenson.com>

extra commentary

e878061

Signed-off-by: Lee Benson <lee@leebenson.com>

leebenson modified the milestones: 2020-09-14 - The Grid, 2020-09-28 - Derezzed Sep 24, 2020

Hoverbear approved these changes Sep 24, 2020

View reviewed changes

Merge branch 'master' into leebenson/graphql-client

d8626d5

sghall approved these changes Sep 24, 2020

View reviewed changes

leebenson added 2 commits September 25, 2020 06:58

Merge branch 'master' into leebenson/graphql-client

de0617d

fix tests

c5c9491

Signed-off-by: Lee Benson <lee@leebenson.com>

leebenson merged commit ad77d20 into master Sep 25, 2020

leebenson deleted the leebenson/graphql-client branch September 25, 2020 07:31

This was referenced Sep 25, 2020

chore(rfcs): API topology RFC #3677

Closed

Fix flaky uptime_metrics tests for macOS #4175

Closed

Hoverbear mentioned this pull request Sep 28, 2020

chore(deps): bump uuid from 0.7.4 to 0.8.1 #4159

Merged


		let json = serde_json::to_string_pretty(&async_graphql::http::GQLResponse(res)).unwrap();

		fs::write("graphql/schema.json", json).expect("Couldn't save schema file");

Conversation

leebenson commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Schema generation

Querying

Executing a query / parsing responses

Use cases

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leebenson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

WebSockets

Multiplexed subscriptions

WeakValueHashMap + broadcast channels

Implicit dropping

Uh oh!

leebenson commented Sep 21, 2020

Uh oh!

leebenson commented Sep 22, 2020

Uh oh!

sghall commented Sep 24, 2020

Uh oh!

leebenson commented Sep 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leebenson commented Sep 9, 2020 •

edited

Loading

leebenson left a comment •

edited

Loading