Skip to content

enhancement(observability): GraphQL client#3778

Merged
leebenson merged 63 commits intomasterfrom
leebenson/graphql-client
Sep 25, 2020
Merged

enhancement(observability): GraphQL client#3778
leebenson merged 63 commits intomasterfrom
leebenson/graphql-client

Conversation

@leebenson
Copy link
Copy Markdown
Contributor

@leebenson leebenson commented Sep 9, 2020

Adds graphql-client, for typed GraphQL queries against the API server implemented in #3708. Currently targets that same PR for visibility, and is a draft for now.

It comprises the following elements:

Schema generation

#3708 introduces async-graphql as a code-first implementation of GraphQL schema.

Since graphql-client validates queries at compile-time, it requires a reference to the schema file (either GraphQL SDL, or the JSON equivalent). Async-graphql doesn't generate this out of the box.

I've added a generator in src/api/schema/gen.rs, which builds a runtime version of the schema, issues an introspection query against it, and serializes the response to JSON. It can be executed with:

cargo run --bin graphql-schema

The resulting file will be saved to graphql/schema.json, relative to the Cargo.toml root.

This file is intended to be checked in. In Alloy, we had a CI test to ensure the checked in version matches the latest schema, by doing a git diff. I think it'd make sense to add one here too, to guarantee all queries are up to date.

Querying

To build a query, we derive from graphql-client's GraphQLQuery type, and provide links to the GraphQL schema, and the individual .graphql file containing the query:

#[derive(GraphQLQuery)]
#[graphql(
    schema_path = "graphql/schema.json",
    query_path = "graphql/queries/health.graphql",
    response_derives = "Debug"
)]
struct HealthQuery;

In this example, grapql/queries/health.graphql contains:

query HealthQuery {
  health
}

The name of the struct must match the text label provided in the query. Multiple queries can optionally be combined into a single .graphql (generally not relevant/useful), and the macro will expand the correct query.

The validity of the query is asserted by matching against the schema. Any errors reflect during build, e.g if we change health -> healthz, we get:

error: proc-macro derive panicked
  --> tests/api.rs:16:14
   |
16 |     #[derive(GraphQLQuery)]
   |              ^^^^^^^^^^^^
   |
   = help: message: Code generation failed.
           could not find field `healthz`

Executing a query / parsing responses

Graphql-client builds the query, variables and response types. The transport is userland. In tests/api.rs, I'm using reqwest per:

async fn query<T: GraphQLQuery>(
    request_body: &graphql_client::QueryBody<T::Variables>,
) -> graphql_client::Response<T::ResponseData> {
    let config = api_enabled_config();
    let addr = config.api.bind.unwrap();
    let url = format!("http://{}:{}/graphql", addr.ip(), addr.port());

    start_api_server(addr, false);

    let client = reqwest::Client::new();

    retry_until(
        || client.post(&url).json(&request_body).send(),
        Duration::from_millis(100),
        Duration::from_secs(10),
    )
    .await
    .json()
    .await
    .unwrap()
}

#[tokio::test]
async fn api_graphql_health() {
    let request_body = HealthQuery::build_query(health_query::Variables);
    let res = query::<HealthQuery>(&request_body).await;

    assert_matches!(
        res,
        graphql_client::Response {
            data: Some(health_query::ResponseData { health: true }),
            errors: None,
        }
    );
}

This:

  • Builds the GraphQL query, from the generated HealthQuery struct
  • Spawns the API server
  • Uses reqwest to POST the request body to the /graphql endpoint
  • Validates against the typed health_query::ResponseData, containing health: true.

Use cases

Currently, this is only used in API integration tests. My intention is for us to use this with vector top to issue the same queries that the UI will be issuing (albeit with its own, separate code gen + client libs) and anywhere else we need to make compile-time guarantees against API data -- metrics and otherwise.

Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
@leebenson leebenson added the domain: observability Anything related to monitoring/observing Vector label Sep 9, 2020
@leebenson leebenson self-assigned this Sep 9, 2020
Signed-off-by: Lee Benson <lee@leebenson.com>
Comment thread Cargo.toml
default-run = "vector"

[[bin]]
name = "graphql-schema"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced this for running schema generation via Cargo:

cargo run --bin graphql-schema

I looked for an existing pattern prior to doing this, to see if there are other entrypoints / helper bins running elsewhere, which didn't yield anything obvious. It doesn't really fit a workspace candidate for lib -- it's not really a library, nor does it provide an obvious way to call back into parent deps.

If there is a preferred approach to running helpers that aren't part of the main vector bin (which might include where to keep the source), would appreciate the pointer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend for this to be a build time thing? It could be part of a build script and run on changed with https://doc.rust-lang.org/cargo/reference/build-scripts.html ?

@@ -0,0 +1,3 @@
query HealthQuery {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All queries (and mutations/subscriptions) are placed in .graphql files, which are validated at compile time and embedded in the bin as strings.

Comment thread src/api/schema/gen.rs
use vector::api::schema::build_schema;

static INTROSPECTION_QUERY: &str = r#"
query IntrospectionQuery {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the full introspection query that retrieves all parts of the GraphQL schema, including a few of the features we don't currently use (like directives) but may in future.

Comment thread src/api/schema/gen.rs
#[tokio::main]
async fn main() {
let schema = build_schema().finish();
let res = schema.execute(INTROSPECTION_QUERY).await;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This executes the introspection query against the live schema, outside of the Warp instantiation. This is interesting IMO, because it shows how we could bypass HTTP altogether and add a middle tier for marshalling requests via essentially any protocol. HTTP isn't a hard requirement!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be used by things like our wasm engine. 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphql-client is WASM-compatible; all queries, subscriptions, (de)serialisation of payloads etc should work.

I used tokio-tungstenite as the WebSocket client and async-graphql as the GraphQL server. I'm not sure if/how we'd intend to use these in wasm, but there's probably also the option of leveraging the native browser APIs for WebSockets.

Let me know if you have any specific use-cases in mind for leveraging GraphQL with WASM-- interesting!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely want to expose a query(some_graphql_string) via the hostcall API. :) I'll do that later though!

Comment thread src/api/schema/gen.rs Outdated

let json = serde_json::to_string_pretty(&async_graphql::http::GQLResponse(res)).unwrap();

fs::write("graphql/schema.json", json).expect("Couldn't save schema file");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stores the schema at graphql/schema.json, based on the cwd. I'm expecting this would typically be invoked via cargo run and therefore relative to Cargo.toml. I'm also assuming that the graphql dir actually exists, which it should by default (since this file is checked in.)

If we ever want to deploy this separately (say, in a Docker image) or handle cases where the dir structure might not match, then a mkdir -p approach + CLI opts would be an obvious upgrade path. Kept this simple for now.

Copy link
Copy Markdown
Contributor Author

@leebenson leebenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken this out of draft - it should be good to review again.

The main change vs. the last time is the handling of streaming GraphQL subscriptions. I added a SubscriptionClient that aims to work in a similar fashion to the GraphQL client libraries in Javascript, such as Apollo's subscription-transport-ws.

Some notes on how this is specifically implemented follow:

WebSockets

GraphQL subscriptions are typically transported over WebSockets. This allows them to interface natively with modern browsers, and provides bi-directional comms -- required for the client to issue subscription requests, and receive arbitrary payloads.

Multiplexed subscriptions

The WebSocket connection is intended to be long-lived. It can handle multiple disparate GraphQL subscriptions. Control messages and payloads are delivered over a single connection. An { id: Uuid } is attached to each subscription { type: "start" } request (in this client; marshals to a String), and the same ID is included in payloads. This enables the client to determine how to route payloads.

WeakValueHashMap + broadcast channels

I use weak_table's WeakValueHashMap to store subscriptions against the client. The caller is intended to control the lifetime of the subscription; when it goes out of scope, the subscription is removed from the client.

As payloads are received, their { id } is checked and routed to the appropriate Subscription, mapped against their Uuid.

Implicit dropping

When SubscriptionClient goes out of scope, the connection is terminated. The GraphQL server should cancel all active subscriptions.

When an individual Subscription goes out of scope, its Drop implementation sends an { id, type: "stop" } message to the GraphQL client, which should terminate that one subscription and keep others running.


@sghall, tagging you on this too, as both a sanity check against the equivalent implementation in Urql (making sure control messages, etc are right-- I'll do another sweep myself too) since I know we've both been in the weeds with the WS network tab in Dev Tools... and for general edification for how this PR generates queries/subscriptions, similar to GraphQL Code Generator in TS.

I imagine you'll want to play around generating queries yourself once we're at that point in the UI, so just mostly just a heads-up on this initial approach.

@leebenson leebenson requested a review from sghall September 18, 2020 13:52
Signed-off-by: Lee Benson <lee@leebenson.com>
@leebenson
Copy link
Copy Markdown
Contributor Author

A note to reviewers: I'll be adding tests today for:

  • Events processed
  • Uptime
  • Multiple requests in flight in parallel

This is still good to review, but FYI there will be additional tests to look at shortly.

Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
@leebenson
Copy link
Copy Markdown
Contributor Author

I've implemented all the tests that I think are relevant to the scope of this PR, including the various health checks, heartbeats, uptime metrics, and events processed.

@Hoverbear, @JeanMertz -- this wound up being a sizeable PR conceptually, as well as LOC, so I know there's a fair chunk to review here. I appreciate you taking the time to look at this.

Further evolution of the schema should hopefully happen in smaller steps, now that the API server + client are out of the way.

If you have any questions or want me to go into further detail on any of this, happy to walk through on git/chat/zoom/loom as preferred. Thanks!

Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Lee Benson <lee@leebenson.com>
@sghall
Copy link
Copy Markdown
Member

sghall commented Sep 24, 2020

@leebenson taking a look at this PR. Really great stuff here!

Just wanted to note that I can occasionally get the api tests to fail locally on my mac. I tried my best to figure it out, but to no avail 😞 I'm logging out that "seconds: 0" you see in the output (when it passes seconds == 3). That's in the new_uptime_subscription function in the api tests. Seems really random. Have to run it ~5-10 times to get it to fail. It's always these same two tests.

stevenhall@steves-macbook vector % cargo test --test api -- --nocapture
    Finished test [unoptimized + debuginfo] target(s) in 0.37s
     Running target/debug/deps/api-491a23c67c646c19

running 8 tests
test tests::api_playground_disabled ... ok
test tests::api_playground_enabled ... ok
test tests::api_health ... ok
test tests::api_graphql_health ... ok
test tests::api_graphql_event_processed_metrics ... ok
test tests::api_graphql_heartbeat ... ok
seconds: 0
thread 'tests::api_graphql_combined_heartbeat_uptime' panicked at 'assertion failed: seconds > num_results as f64 - 1.0', tests/api.rs:224:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test tests::api_graphql_combined_heartbeat_uptime ... FAILED
seconds: 0
thread 'tests::api_graphql_uptime_metrics' panicked at 'assertion failed: seconds > num_results as f64 - 1.0', tests/api.rs:224:9
test tests::api_graphql_uptime_metrics ... FAILED

failures:

failures:
    tests::api_graphql_combined_heartbeat_uptime
    tests::api_graphql_uptime_metrics

test result: FAILED. 6 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out

@leebenson
Copy link
Copy Markdown
Contributor Author

Thanks for checking it out, and surfacing those tests @sghall. Looks like it's 'uptime' in both of those tests. Each test initialises metrics with:

// Initialize the metrics system. Idempotent.
fn init_metrics() {
    METRIC_INIT.call_once(|| {
        let _ = vector::metrics::init();

        // Spawn the internal 'heartbeat' event, which triggers uptime
        tokio::spawn(heartbeat::heartbeat());
    })
}

With heartbeat::heartbeat() being:

/// Emits Heartbeat event every second.
pub async fn heartbeat() {
    let since = Instant::now();
    let mut interval = interval(Duration::from_secs(1));
    loop {
        interval.tick().await;
        emit!(Heartbeat { since });
    }
}

It could be that the GraphQL subscription query returns '0' on its first run and '0' on the second, firing faster than the interval is able to emit.

I'll update the streaming interval to be 1,200ms for GraphQL to cover that possibility.

Thanks for catching!

@leebenson leebenson merged commit ad77d20 into master Sep 25, 2020
@leebenson leebenson deleted the leebenson/graphql-client branch September 25, 2020 07:31
mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
* wip

Signed-off-by: Lee Benson <lee@leebenson.com>

* API v2

Signed-off-by: Lee Benson <lee@leebenson.com>

* -println

Signed-off-by: Lee Benson <lee@leebenson.com>

* happy clippy

Signed-off-by: Lee Benson <lee@leebenson.com>

* copyable playground url

Signed-off-by: Lee Benson <lee@leebenson.com>

* loop

Signed-off-by: Lee Benson <lee@leebenson.com>

* ohNo

* fix conflicts

Signed-off-by: Lee Benson <lee@leebenson.com>

* fix config tests

Signed-off-by: Lee Benson <lee@leebenson.com>

* stop breaking things

Signed-off-by: Lee Benson <lee@leebenson.com>

* separate metrics

Signed-off-by: Lee Benson <lee@leebenson.com>

* 3x metrics

Signed-off-by: Lee Benson <lee@leebenson.com>

* health and playground tests

Signed-off-by: Lee Benson <lee@leebenson.com>

* graphql client; health test

Signed-off-by: Lee Benson <lee@leebenson.com>

* start server

Signed-off-by: Lee Benson <lee@leebenson.com>

* required-features

Signed-off-by: Lee Benson <lee@leebenson.com>

* Update Cargo.lock

* server shutdown

Signed-off-by: Lee Benson <lee@leebenson.com>

* fix API server scope

Signed-off-by: Lee Benson <lee@leebenson.com>

* drop(server)

Signed-off-by: Lee Benson <lee@leebenson.com>

* fix events/mod

* minor cleanup

Signed-off-by: Lee Benson <lee@leebenson.com>

* lint

Signed-off-by: Lee Benson <lee@leebenson.com>

* wip

* heartbeat

Signed-off-by: Lee Benson <lee@leebenson.com>

* retry_until

* check-internal-events

Signed-off-by: Lee Benson <lee@leebenson.com>

* working client

* check-fmt

Signed-off-by: Lee Benson <lee@leebenson.com>

* heartbeat assertions

Signed-off-by: Lee Benson <lee@leebenson.com>

* gen newline

* drop moot helper

* close -> stop

Signed-off-by: Lee Benson <lee@leebenson.com>

* uptime metrics

Signed-off-by: Lee Benson <lee@leebenson.com>

* events processed test

Signed-off-by: Lee Benson <lee@leebenson.com>

* parallel subscriptions

Signed-off-by: Lee Benson <lee@leebenson.com>

* minor use

Signed-off-by: Lee Benson <lee@leebenson.com>

* extra commentary

Signed-off-by: Lee Benson <lee@leebenson.com>

* fix tests

Signed-off-by: Lee Benson <lee@leebenson.com>
Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: observability Anything related to monitoring/observing Vector

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants