Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Big Query Connector #1566

Merged
merged 34 commits into from
May 4, 2022
Merged

Google Big Query Connector #1566

merged 34 commits into from
May 4, 2022

Conversation

ramonacat
Copy link
Collaborator

@ramonacat ramonacat commented Apr 21, 2022

Signed-off-by: Ramona Luczkiewicz rluczkiewicz@wayfair.com

Pull request

Description

This of course is a very early prototype that does not do anything useful, except for writing the same row over and over.
You can get the token by running gcloud auth print-access-token (this will be replaced with proper google cloud auth).

define flow gbqtest
flow
    define pipeline passthrough
    pipeline
        select event from in into out;
    end;

    define connector gbq from gbq
    with
        config = {
            "token": "...",
            "table_id": "projects/tremor/datasets/test/tables/streaming_test"
        }
    end;

    define connector metronome from metronome
    with
        config = {
            "interval": 5000000000
        }
    end;

    create connector metronome;
    create connector gbq;

    create pipeline passthrough;

    connect /connector/metronome to /pipeline/passthrough;
    connect /pipeline/passthrough to /connector/gbq;
end;

deploy flow gbqtest;

Related

  • RFC
  • Related Issues: fixes #000, closed #000
  • Related docs PR

Checklist

  • The RFC, if required, has been submitted and approved
  • Any user-facing impact of the changes is reflected in docs.tremor.rs
  • The code is tested
  • Use of unsafe code is reasoned about in a comment
  • Update CHANGELOG.md appropriately, recording any changes, bug fixes, or other observable changes in behavior
  • The performance impact of the change is measured (see below)

Performance

@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #1566 (75a4abd) into main (47588fd) will increase coverage by 0.15%.
The diff coverage is 85.75%.

@@            Coverage Diff             @@
##             main    #1566      +/-   ##
==========================================
+ Coverage   85.46%   85.61%   +0.15%     
==========================================
  Files         230      232       +2     
  Lines       45996    47014    +1018     
==========================================
+ Hits        39309    40250     +941     
- Misses       6687     6764      +77     
Flag Coverage Δ
unittests 85.61% <85.75%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/connectors/impls/gbq/writer.rs 39.13% <39.13%> (ø)
src/connectors/impls/gbq/writer/sink.rs 84.89% <84.89%> (ø)
src/connectors.rs 86.06% <100.00%> (+0.02%) ⬆️
src/errors.rs 80.47% <100.00%> (+1.06%) ⬆️
src/preprocessor/decompress.rs 88.03% <100.00%> (+18.93%) ⬆️
tremor-pipeline/src/lib.rs 85.22% <100.00%> (+1.72%) ⬆️
tremor-script/src/std_lib/win.rs 83.83% <100.00%> (+10.98%) ⬆️
tremor-value/src/value.rs 98.90% <0.00%> (-0.10%) ⬇️
src/pipeline.rs 76.94% <0.00%> (+0.27%) ⬆️
src/connectors/source.rs 81.74% <0.00%> (+1.03%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 47588fd...75a4abd. Read the comment docs.

@coveralls
Copy link
Collaborator

coveralls commented Apr 21, 2022

Pull Request Test Coverage Report for Build 2269936374

  • 873 of 1018 (85.76%) changed or added relevant lines in 7 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 85.613%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/connectors/impls/gbq/writer.rs 9 23 39.13%
src/connectors/impls/gbq/writer/sink.rs 736 867 84.89%
Files with Coverage Reduction New Missed Lines %
tremor-value/src/value.rs 1 98.9%
Totals Coverage Status
Change from base Build 2268493504: 0.2%
Covered Lines: 40250
Relevant Lines: 47014

💛 - Coveralls

src/connectors.rs Outdated Show resolved Hide resolved
@ramonacat
Copy link
Collaborator Author

Left to do:

  1. Properly handle authorization - we already depend on the gouth crate, so will use that
  2. Handle disconnections/reconnections
  3. Do not panic if the JSON types do not match the expected types - I think we should just treat this as an error, we can always add automated type conversions later, if a need arises, but it's hard to make changes to that logic once it's in place (without breaking backwards compatibility). Thoughts?

@ramonacat ramonacat changed the title Google Big Query Connector - Early prototype Google Big Query Connector Apr 26, 2022
trace_id: "".to_string(),
rows: Some(append_rows_request::Rows::ProtoRows(ProtoData {
writer_schema: Some(ProtoSchema {
proto_descriptor: Some(mapping.descriptor().clone()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only needs to be provided upon the first request (after connect). We could store the full ProtoSchema in an Option<ProtoSchema> and extract it with a self.proto_schema.take() here and have it be a None all the time, saving 1 clone. :)
We could do the same with trace_id if we would have it configured on the connector.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is indeed what the docs say. But appraently it's not true:

[2022-04-29T11:47:12Z ERROR tremor_runtime::connectors::impls::gbq::writer::sink] BigQuery error: status: InvalidArgument, message: "Expect a user schema to be passed in Entity: projects/.../datasets/test/tables/streaming_test/streams/...", details: [], metadata: MetadataMap { headers: {} 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, very interesting... :oldmanshakesfistatcloud:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that the client we use under the hood close the connection for each request and that's why we need to re-send it? GRPC does favour long lived connections.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying client is tonic, which is pretty good about handling long-lived connections, as far as I know. On the other hand - we're using HTTP here, so I expect the connection might not survive forever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh if it's HTTP not HTTP2 then that explains it, http will be counted as 1 connection per request, using http2/quick would probably solve that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to use grpc status codes for this purpose - perhaps thats something worth investigating and optimising in a separate ticket ( this would impact otel similarly and in future other grpc endpoints )

Ramona Luczkiewicz added 8 commits May 4, 2022 14:27
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Ramona Luczkiewicz and others added 18 commits May 4, 2022 14:27
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
@ramonacat ramonacat marked this pull request as ready for review May 4, 2022 12:27
@ramonacat ramonacat requested a review from anupdhml as a code owner May 4, 2022 12:27
Licenser
Licenser previously approved these changes May 4, 2022
Copy link
Member

@Licenser Licenser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
darach
darach previously approved these changes May 4, 2022
Copy link
Member

@darach darach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🛩️

Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Signed-off-by: Ramona Luczkiewicz <rluczkiewicz@wayfair.com>
Copy link
Member

@Licenser Licenser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 great work!

@Licenser Licenser merged commit 9eed714 into tremor-rs:main May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants