Skip to content

v1.0.0

Compare
Choose a tag to compare
@Groxx Groxx released this 26 Apr 19:59

We are v1.0! (with a schema upgrade)

What does this mean?!

Not much. Primarily that we are declaring "it's stable and in use" more visibly, because we continually get questions about this :) A larger public announcement / state-of-the-project is in the works.

Importantly, v1.0 does not imply any change to backwards compatibility (the minimum supported client version has not changed), RPC compatibility (ditto, all changes are backwards compatible), or Go API compatibility (this is not truly a library, Go compatibility is not a goal).

Going by previous version patterns, this would have been labeled v0.26.0 as it is a relatively incremental change (plus schema changes) from v0.25.0. As such, some strings still reference "0.26", because this older SHA is the one we have been using the most internally.
These strings will be updated and validated soon, and will likely be released as v1.0.1. This should have no behavioral impact at all, but will be visible in metrics, logs, and display strings.

What do I need to do to upgrade?

Schema upgrades needed

There have been schema changes to both normal and visibility datastores, primarily to provide better data for cleanup and hot-shard detection:

These were intentionally kept out of v0.25.0 to keep that upgrade simple, as they were not fully utilized yet.

Replication cache recommendation

We have internally disabled the replication cache (history.replicatorCacheCapacity dynamic config set to 0), due to unexpectedly large memory use under abnormal load, and you may wish to do so as well.

We did not encounter any misbehavior, and it did reduce database load as intended, but we intend to make some changes to it to estimate and constrain memory use before re-enabling.

What has changed?

At a very high level, we've been focused on:

  • Internal scaling challenges, both improving bottlenecks and improving our ability to accurately identify bottlenecks
    • Many metrics, logs, and refactors are at least somewhat related to this
    • Our multi-cluster support is improved in particular, as we have been connecting clusters and moving many domains to spread load more evenly
  • Database corruptions, as our Cassandra clusters have had some problems that cause issues for months
    • Many logs, scanner, and stale-task changes are related to this, e.g. to detect and remove invalid data
  • Scaling up the team
    • More changes to come!

Some loosely categorized PRs that were included follows:

Critical bugfixes (resolving issues in v0.25.0)

Parent-close-policies apply to child workflows even after they reset/continue-as-new/etc

  • Update parent close policy to terminate/cancel child workflows even after continue as new by @Shaddoll in #5032
    • This requires new stored data, so it does not apply to child workflows started before this version.

Better config introspection

Schemas are now available via the go module, as go:embed files

Enhancing existing metrics and logging (and more included in other PRs)

Misc

New Contributors

Full Changelog: v0.25.0...v1.0.0