Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not abort when unrelated connections are not yet loaded #4233

Conversation

pdecat
Copy link
Contributor

@pdecat pdecat commented Apr 8, 2024

This PR resolves issues when connection configurations are created dynamically concurrently to executing queries.

Symptoms are that an unrelated connection that hasn't completed loading breaks other connections:

2024-04-05 12:05:19.955 UTC [INFO]  hub: goFdwBeginForeignScan, connection 'aws_000000000000', table 'aws_ec2_transit_gateway', explain: false
2024-04-05 12:05:19.957 UTC [WARN]  hub: pluginHub.GetIterator FAILED: failed to attach to plugin process for 'aws@0.133.0': no connection config loaded for 'aws_222222222222'
2024-04-05 12:05:19.957 UTC [299] ERROR:  failed to attach to plugin process for 'aws@0.133.0': no connection config loaded for 'aws_222222222222'
2024-04-05 12:05:19.957 UTC [299] STATEMENT:  SELECT 'APIGATEWAY' AS subtype, row_to_json(r) AS data FROM (SELECT
[...SNIP...]
2024-04-05 12:05:19.958 UTC [INFO]  hub: goFdwAbortCallback
2024-04-05 12:05:19.958 UTC [INFO]  hub: Hub Abort
2024-04-05 12:05:20.023 UTC [INFO]  hub: goFdwIterateForeignScan returned empty row - this scan complete (0xc00226f080)
2024-04-05 12:05:20.025 UTC [INFO]  hub: goFdwIterateForeignScan calling pluginHub.StartScan, table 'aws_account' (0xc00226f680)
2024-04-05 12:05:20.025 UTC [INFO]  hub: StartScan for table: aws_account, cache enabled: false, iterator 0xc00226f680, 0 quals (1712318705836)
2024-04-05 12:05:20.026 UTC [INFO]  hub: goFdwIterateForeignScan returned empty row - this scan complete (0xc00226f680)
2024-04-05 12:05:20.028 UTC [INFO]  hub: goFdwReScanForeignScan, connection '', table 'aws_ec2_transit_gateway'
2024-04-05 12:05:20.028 UTC [INFO]  hub: goFdwBeginForeignScan, connection 'aws_111111111111', table 'aws_ec2_transit_gateway', explain: false
2024-04-05 12:05:20.030 UTC [WARN]  hub: pluginHub.GetIterator FAILED: failed to attach to plugin process for 'aws@0.133.0': no connection config loaded for 'aws_222222222222'
2024-04-05 12:05:20.030 UTC [INFO]  hub: .
******************************************************

                steampipe postgres fdw shutdown

******************************************************

This problem seems to happen more frequently when connections are slower to load, most notably when limiters are configured.

See https://turbot-community.slack.com/archives/C01UECB59A7/p1712324097931309

For this change to be effective, the FDW has to be rebuilt with:

go mod edit -replace="github.com/turbot/steampipe=github.com/pdecat/steampipe @0a5b8a0928292f08ad842a63211188306e9998de"

pdecat added a commit to pdecat/steampipe-4233-repro that referenced this pull request Apr 8, 2024
@pdecat
Copy link
Contributor Author

pdecat commented Apr 8, 2024

Submitted a reproduction case: https://github.com/pdecat/steampipe-4233-repro

@pdecat pdecat force-pushed the fix/do_not_abort_when_unrelated_connection_not_loaded branch from 0a5b8a0 to 68c3fe7 Compare April 10, 2024 12:27
@pdecat pdecat force-pushed the fix/do_not_abort_when_unrelated_connection_not_loaded branch 2 times, most recently from 38b3446 to 3bcdc14 Compare April 26, 2024 14:54
@pdecat pdecat force-pushed the fix/do_not_abort_when_unrelated_connection_not_loaded branch from 3bcdc14 to 4b663e0 Compare May 3, 2024 13:31
@pdecat pdecat force-pushed the fix/do_not_abort_when_unrelated_connection_not_loaded branch 2 times, most recently from 0e72e8c to fc45610 Compare May 17, 2024 13:17
@pdecat pdecat force-pushed the fix/do_not_abort_when_unrelated_connection_not_loaded branch from fc45610 to e873694 Compare May 23, 2024 08:32
@kaidaguerre
Copy link
Contributor

hi @pdecat
sorry for the long delay on this one. I actually spent a bunch of time digging into this one to try and identify the underlying reasons, but I failed/ran out of time and had to move on to other stuff.

Came back to this today and I think it's a fine tactical fix for now. I'd still like to get to the bottom of it but I'll merge in the meantime.

@kaidaguerre kaidaguerre merged commit 7cb8edd into turbot:main Jun 6, 2024
32 checks passed
@pdecat pdecat deleted the fix/do_not_abort_when_unrelated_connection_not_loaded branch June 6, 2024 12:22
@pdecat
Copy link
Contributor Author

pdecat commented Jun 6, 2024

Hi @kaidaguerre, no problem for the delay, been using my own fork until now :)

Thanks! 🎉

@pdecat pdecat restored the fix/do_not_abort_when_unrelated_connection_not_loaded branch June 14, 2024 12:54
@pdecat
Copy link
Contributor Author

pdecat commented Jul 12, 2024

The issue described in the PR description reproduces with HEAD of v0.23.x branch with FDW 1.11.3:

2024-07-12 15:04:34.927 UTC [INFO]  hub: goFdwReScanForeignScan, connection '', table 'aws_account'
2024-07-12 15:04:34.927 UTC [INFO]  hub: goFdwBeginForeignScan, connection 'aws_000000000000', table 'aws_account', explain: false
2024-07-12 15:04:34.927 UTC [INFO]  hub: goFdwBeginForeignScan, canPushdownAllSortFields true
2024-07-12 15:04:34.927 UTC [INFO]  hub: goFdwBeginForeignScan, table 'aws_account', sortOrder: []
2024-07-12 15:04:34.927 UTC [INFO]  hub: RemoteHub startScanForConnection 'aws_000000000000' limit -1
2024-07-12 15:04:34.947 UTC [WARN]  hub: pluginHub.GetIterator FAILED: failed to attach to plugin process for 'aws': no connection config loaded for 'aws_111111111111'
2024-07-12 15:04:34.947 UTC [INFO]  hub: buildConnectionLimitMap, table: 'aws_account', 1 connection, limit: -1
2024-07-12 15:04:34.947 UTC [INFO]  hub: connection 'aws_222222222222', table 'aws_account', quals
----------------------------------------------------------------
Column: account_id, Operator: '=', Value: '222222222222'
----------------------------------------------------------------
2024-07-12 15:04:34.947 UTC [300] ERROR:  failed to attach to plugin process for 'aws': no connection config loaded for 'aws_111111111111'
2024-07-12 15:04:34.947 UTC [300] STATEMENT:  SELECT 'APIGATEWAY' AS subtype, row_to_json(r) AS data FROM (SELECT
[...SNIP...]
2024-07-12 15:04:34.948 UTC [INFO]  hub: goFdwIterateForeignScan calling pluginHub.StartScan, table 'aws_account' Current timestamp: 774111873000000 (0xc000d0fc00)
2024-07-12 15:04:34.948 UTC [INFO]  hub: hubBase StartScan
2024-07-12 15:04:34.948 UTC [INFO]  hub: build executeConnectionData map: hub: 0xc0005da9b0, i.connectionLimitMap: map[aws_222222222222:-1]
2024-07-12 15:04:34.948 UTC [INFO]  hub: build executeConnectionData map returning table:"aws_account" query_context:{columns:"account_id" quals:{key:"account_id" value:{quals:{fieldName:"account_id" string_value:"=" value:{string_value:"222222222222"}}}}} connection:"aws_222222222222" call_id:"1720796674139" trace_context:{value:"{}"} executeConnectionData:{key:"aws_222222222222" value:{cache_enabled:true cache_ttl:300}}
2024-07-12 15:04:34.948 UTC [INFO]  hub: StartScan for table: aws_account, cache enabled: false, iterator 0xc000d0fc00, 1 quals (1720796674139)
2024-07-12 15:04:30.613 UTC [INFO]  hub: goFdwAbortCallback
2024-07-12 15:04:34.948 UTC [INFO]  hub: AddScanMetadata for iterator 0xc000a08700 query timestamp 774111852000000 (aws_000000000000)
2024-07-12 15:04:34.948 UTC [INFO]  hub: removeStaleScanMetadata for current query timestamp 774111852000000
[...SNIP...]
2024-07-12 15:04:38.418 UTC [INFO]  hub: .
******************************************************

                steampipe postgres fdw shutdown

******************************************************

@pdecat
Copy link
Contributor Author

pdecat commented Jul 17, 2024

Using strings on the official v1.11.3 build of FDW, I can verify that my changes are not included as this first command should have returned something:

# strings .steampipe/db/14.2.0/postgres/lib/postgresql/steampipe_postgres_fdw.so | grep -F "[WARN] no connection config loaded for '"

If I remove the [WARN] prefix, I see it is the old version:

# strings .steampipe/db/14.2.0/postgres/lib/postgresql/steampipe_postgres_fdw.so | grep -F "no connection config loaded for '"
accessed data from freed user arena runtime: wrong goroutine in newstackruntime: invalid pc-encoded table f=trace: non-empty global trace buffer[TRACE] plugin client created for %sno connection config loaded for '%s'positional arg index %d out of range444089209850062616169452667236328125ryuFtoaFixed64 called with prec > 180123456789abcdefghijklmnopqrstuvwxyzmethod ABI and value ABI don't alignreflect.Value.Equal: values of type Channel switches to new LB policy %qSubchannel Connectivity change to %vnil receiver passed to UnmarshalJSONgoogle/protobuf/source_context.protocannot parse '%s', %d overflows uintcannot parse '%s', %f overflows uintcompressed name in SRV resource datareflect: NumField of non-struct typehttp: no Location header in responsehttp: unexpected EOF reading trailerhttp: invalid byte %q in Cookie.Path LastStreamID=%v ErrCode=%v Debug=%qRoundTrip retrying after failure: %vno acceptable authentication methodsinvalid semicolon separator in queryA static map expression is required.can't setValue for a nil EvalContextInvalid template interpolation valueTemplateJoinExpr got non-tuple tupleMissing close bracket on splat indexcty.Capsule(%q, reflect.TypeOf(%#v))%d elements are required, but got %dunsupported value type %#v in Equalselement key for tuple must be numberwrong number of elements %d; need %d6ba7b810-9dad-11d1-80b4-00c04fd430c86ba7b811-9dad-11d1-80b4-00c04fd430c86ba7b812-9dad-11d1-80b4-00c04fd430c86ba7b814-9dad-11d1-80b4-00c04fd430c8%v: MessageSet with no unknown fieldinvalid gRPC request content-type %qpeer header list size exceeded limittransport: malformed grpc-status: %vtransport: malformed http-status: %vfailed to write the HTTP request: %vcannot merge into invalid %v messageduration (%v) has out-of-range nanosinvalid pattern syntax (+ after -): expected an ECDSA public key, got %TTLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHATLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHAtls: keys must have at least one keyunsupported SSLv2 handshake receivedtls: server did not send a key sharex509: zero or negative DSA parameterx509: invalid CRL distribution pointx509: invalid subject key identifierx509: malformed algorithm identifierx509: invalid X25519 private key: %vx509: serial number must be positive  Owner: %d [%s] (current: %d [%s])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants