Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sharded SQL driver to support using multiple SQL databases #4504

Merged
merged 18 commits into from
Oct 3, 2021

Conversation

longquanzheng
Copy link
Collaborator

@longquanzheng longquanzheng commented Sep 23, 2021

What changed?
Implement sharded SQL driver to support using multiple SQL databases

Why?
Support using multiple SQL databases for larger scale of Cadence

How did you test it?
working on adding more tests

Local testing is okay:

(qlong-selector-signal-counter-wrong) $./bin/helloworld
2021-09-24T21:07:06.220-0700	INFO	common/sample_helper.go:109	Logger created.
2021-09-24T21:07:06.220-0700	DEBUG	common/factory.go:151	Creating RPC dispatcher outbound	{"ServiceName": "cadence-frontend", "HostPort": "127.0.0.1:7933"}
2021-09-24T21:07:06.226-0700	INFO	common/sample_helper.go:161	Domain successfully registered.	{"Domain": "samples-domain"}
2021-09-24T21:07:06.272-0700	INFO	common/sample_helper.go:195	Started Workflow	{"WorkflowID": "helloworld_75cf142b-c0de-407e-9115-1d33e9b7551a", "RunID": "98a229b8-8fdd-4d1f-bf41-df00fb06f441"}
qlong@~/indeed/cadence-samples:
(qlong-selector-signal-counter-wrong) $2021-09-24T21:07:06.347-0700	INFO	helloworld/helloworld_workflow.go:31	helloworld workflow started	{"Domain": "samples-domain", "TaskList": "helloWorldGroup", "WorkerID": "16520@IT-USA-25920@helloWorldGroup", "WorkflowType": "helloWorldWorkflow", "WorkflowID": "helloworld_75cf142b-c0de-407e-9115-1d33e9b7551a", "RunID": "98a229b8-8fdd-4d1f-bf41-df00fb06f441"}
2021-09-24T21:07:06.347-0700	DEBUG	internal/internal_event_handlers.go:489	ExecuteActivity	{"Domain": "samples-domain", "TaskList": "helloWorldGroup", "WorkerID": "16520@IT-USA-25920@helloWorldGroup", "WorkflowType": "helloWorldWorkflow", "WorkflowID": "helloworld_75cf142b-c0de-407e-9115-1d33e9b7551a", "RunID": "98a229b8-8fdd-4d1f-bf41-df00fb06f441", "ActivityID": "0", "ActivityType": "main.helloWorldActivity"}
2021-09-24T21:07:06.437-0700	INFO	helloworld/helloworld_workflow.go:62	helloworld activity started	{"Domain": "samples-domain", "TaskList": "helloWorldGroup", "WorkerID": "16520@IT-USA-25920@helloWorldGroup", "ActivityID": "0", "ActivityType": "main.helloWorldActivity", "WorkflowType": "helloWorldWorkflow", "WorkflowID": "helloworld_75cf142b-c0de-407e-9115-1d33e9b7551a", "RunID": "98a229b8-8fdd-4d1f-bf41-df00fb06f441"}
2021-09-24T21:07:06.513-0700	INFO	helloworld/helloworld_workflow.go:55	Workflow completed.	{"Domain": "samples-domain", "TaskList": "helloWorldGroup", "WorkerID": "16520@IT-USA-25920@helloWorldGroup", "WorkflowType": "helloWorldWorkflow", "WorkflowID": "helloworld_75cf142b-c0de-407e-9115-1d33e9b7551a", "RunID": "98a229b8-8fdd-4d1f-bf41-df00fb06f441", "Result": "Hello Cadence!"}

Potential risks
Low risks. It's an optional feature. No behavior change to existing setup.

Release notes

Documentation Changes

@longquanzheng longquanzheng changed the title [WIP] Implement sharded SQL driver Implement sharded SQL driver to support using multiple SQL databases Sep 24, 2021
@coveralls
Copy link

coveralls commented Sep 25, 2021

Pull Request Test Coverage Report for Build edb37291-69e8-457b-a724-e5a933471be2

  • 48 of 250 (19.2%) changed or added relevant lines in 10 files are covered.
  • 68 unchanged lines in 13 files lost coverage.
  • Overall coverage decreased (-0.05%) to 56.388%

Changes Missing Coverage Covered Lines Changed/Added Lines %
common/persistence/sql/sqlplugin/postgres/admin.go 0 1 0.0%
common/persistence/sql/sqlplugin/mysql/db.go 7 9 77.78%
common/persistence/sql/sqlplugin/postgres/db.go 6 8 75.0%
common/persistence/sql/sqldriver/driver.go 5 9 55.56%
common/config/persistence.go 6 36 16.67%
common/persistence/sql/sqldriver/connections.go 5 35 14.29%
common/persistence/sql/sqldriver/sharded.go 0 133 0.0%
Files with Coverage Reduction New Missed Lines %
client/history/client.go 2 40.74%
client/history/metricClient.go 2 45.94%
common/task/fifoTaskScheduler.go 2 85.57%
common/task/weightedRoundRobinTaskScheduler.go 2 89.64%
service/history/handler.go 2 47.76%
service/history/queue/timer_queue_processor.go 2 59.33%
service/matching/matcher.go 2 91.46%
common/persistence/nosql/nosqlplugin/cassandra/workflow.go 3 50.23%
service/frontend/workflowHandler.go 4 58.59%
service/history/execution/mutable_state_task_refresher.go 7 71.92%
Totals Coverage Status
Change from base Build 3a2a7565-b4bf-45b3-a266-9286d11b85ae: -0.05%
Covered Lines: 80238
Relevant Lines: 142295

💛 - Coveralls

@longquanzheng longquanzheng marked this pull request as ready for review September 25, 2021 00:36
@@ -70,7 +70,7 @@ func (mdb *db) CreateSchemaVersionTables() error {
// ReadSchemaVersion returns the current schema version for the keyspace
func (mdb *db) ReadSchemaVersion(database string) (string, error) {
var version string
err := mdb.driver.Get(sqlplugin.DbAllShards, &version, readSchemaVersionQuery, database)
err := mdb.driver.Get(sqlplugin.DbDefaultShard, &version, readSchemaVersionQuery, database)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there is difficulty implementing All shards in current driver interface. I let it read from a single shard(database) for now and was planning to improve it as a follow-up: #4509

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like I can also try to implement it here -- there are different ways of doing it -- the simpliest way is to send the queries to all shards and make sure all of them return the same value(deep equal).

Other ideas: 1) allow setting shardID when sending query, or return a set of reset instead of a single one. -- both would require a refactor on the code path.

I feel like it's better do it in a separate PR. What do you think ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's do it in a separate PR.

Copy link
Contributor

@antstorm antstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, can't wait to try it out!

return fmt.Errorf("sql persistence config: connectAddr can only be configured ini multipleDatabasesConfig when UseMultipleDatabases is true ")
}
if ds.SQL.User != "" {
return fmt.Errorf("sql persistence config: user can only be configured ini multipleDatabasesConfig when UseMultipleDatabases is true ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: ini -> in

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Thanks!

return fmt.Errorf("sql persistence config: password can only be configured ini multipleDatabasesConfig when UseMultipleDatabases is true ")
}
if ds.SQL.NumShards <= 1 || len(ds.SQL.MultipleDatabasesConfig) != ds.SQL.NumShards {
return fmt.Errorf("sql persistence config: nShards must be greater than one and equal to the length of multipleDatabasesConfig")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if we need NumShards or can just rely on length of MultipleDatabasesConfig instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumShards is needed in the case that Cadence uses a sharded SQL solution(like Uber's docStore, or CockroachDB/DorisDB), so we can't remove it(in that case, they don't have MultipleDatabasesConfig.)

I was thinking to give NumShards a default number of length of MultipleDatabasesConfig when not set. But the doc is going to be a little confusing -- when MultipleDatabasesConfig set the default value is length of it, when not set the default value is 1.

On the other side, NumShards is a critical config that having it explicitly will help understanding it when reviewing the config(rather than counting the length the of array).

So I am not sure whether we should complicate that. But open to more ideas.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged it now as it's a nit comment.
But feel free to give suggestion as I will open next PR to tests and keep improving it.

@longquanzheng longquanzheng merged commit f5ce7cb into master Oct 3, 2021
@longquanzheng longquanzheng deleted the qlong-multi-sql branch October 3, 2021 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants