cartridge: expose operation last error to issues #74

DifferentialOrange · 2024-04-12T11:57:25Z

Expose last operation error to Cartridge issues.

(Issues are also exposed to default Grafana dashboard, as well as default alerts.)

(Error message could be improved, but it's always has been like this: I haven't changed anything here in this patch.)

The original issue was about exposing migrations inconsistency from new migrations tab to Cartridge issues as well. But using straightforward approach is rather bad: checking inconsistency is a full cluster map-reduce operation, and, if exposed to get_issues, it will omit N^2 network requests since issues are collected from each instance, there is no way to check whether migrations are consistent without cluster map-reduce and there is no distinct migrator provider -- any instance is migration provider. And, since get_issues may trigger rather often, having such a feature may make cluster unhealthy (we already had similar things with metrics [1]). Last error is reset on each operation call.

tnt_cartridge_issues gather only local issues metrics#243

Closes #73

Expose last operation error to Cartridge issues. The original issue was about exposing migrations inconsistency from new `migrations` tab to Cartridge issues as well. But using straightforward approach is rather bad: checking inconsistency is a full cluster map-reduce operation, and, if exposed to `get_issues`, it will omit N^2 network requests since issues are collected from each instance, there is no way to check whether migrations are consistent without cluster map-reduce and there is no distinct migrator provider -- any instance is migration provider. And, since `get_issues` may trigger rather often, having such a feature may make cluster unhealthy (we already had similar things with metrics [1]). Last error is reset on each operation call. 1. tarantool/metrics#243 Closes #73

DifferentialOrange · 2024-04-15T15:52:36Z

The problem @filonenko-mikhail had pointed out: error is lost on restart, but inconsistency is not.

DifferentialOrange · 2024-04-16T07:58:03Z

For now, I don't see any perfect solution to this one. Two points are the reason:

checking for inconsistency is always a full cluster operation,
module does not have a single entrypoint in terms of Cartridge roles -- user can trigger migrator.up from any node.

If we start to check for inconsistencies on instance start, it may break the cluster in case of new cluster start/full cluster restart/half cluster restart/etc since it would be N^2 again.

DifferentialOrange · 2024-04-16T08:00:08Z

Persisting an error on up caller is also doesn't seem like a good solution since one may start a migrations from RO instance.

DifferentialOrange · 2024-04-16T08:01:49Z

Nonetheless, this solution is broken even without restarts -- errors reset on each up, but if an error has been caught on instance 1, then one would call up on instance 2 and everything will be consistent after second up, issue still will be there since it is cached per-instance.

yngvar-antonsson · 2024-04-27T14:33:57Z

The problem @filonenko-mikhail had pointed out: error is lost on restart, but inconsistency is not.

We have several same issues in Cartridge. I propose just adding a note that the issue stays until restart.

yngvar-antonsson · 2024-04-27T14:42:08Z

Nonetheless, this solution is broken even without restarts -- errors reset on each up, but if an error has been caught on instance 1, then one would call up on instance 2 and everything will be consistent after second up, issue still will be there since it is cached per-instance.

Maybe we could add some "clear cached issues" button in Cartridge? Users can check the actual status of migrations with the migrations tab, can't they?

DifferentialOrange added 2 commits April 12, 2024 12:49

test: support ddl 1.7.0

f0198dc

DifferentialOrange requested review from psergee, better0fdead and filonenko-mikhail and removed request for psergee April 12, 2024 11:57

DifferentialOrange closed this Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cartridge: expose operation last error to issues #74

cartridge: expose operation last error to issues #74

DifferentialOrange commented Apr 12, 2024 •

edited

DifferentialOrange commented Apr 15, 2024 •

edited

DifferentialOrange commented Apr 16, 2024

DifferentialOrange commented Apr 16, 2024

DifferentialOrange commented Apr 16, 2024

yngvar-antonsson commented Apr 27, 2024

yngvar-antonsson commented Apr 27, 2024

cartridge: expose operation last error to issues #74

cartridge: expose operation last error to issues #74

Conversation

DifferentialOrange commented Apr 12, 2024 • edited

DifferentialOrange commented Apr 15, 2024 • edited

DifferentialOrange commented Apr 16, 2024

DifferentialOrange commented Apr 16, 2024

DifferentialOrange commented Apr 16, 2024

yngvar-antonsson commented Apr 27, 2024

yngvar-antonsson commented Apr 27, 2024

DifferentialOrange commented Apr 12, 2024 •

edited

DifferentialOrange commented Apr 15, 2024 •

edited