-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cartridge: expose operation last error to issues #74
cartridge: expose operation last error to issues #74
Conversation
Expose last operation error to Cartridge issues. The original issue was about exposing migrations inconsistency from new `migrations` tab to Cartridge issues as well. But using straightforward approach is rather bad: checking inconsistency is a full cluster map-reduce operation, and, if exposed to `get_issues`, it will omit N^2 network requests since issues are collected from each instance, there is no way to check whether migrations are consistent without cluster map-reduce and there is no distinct migrator provider -- any instance is migration provider. And, since `get_issues` may trigger rather often, having such a feature may make cluster unhealthy (we already had similar things with metrics [1]). Last error is reset on each operation call. 1. tarantool/metrics#243 Closes #73
The problem @filonenko-mikhail had pointed out: error is lost on restart, but inconsistency is not. |
For now, I don't see any perfect solution to this one. Two points are the reason:
If we start to check for inconsistencies on instance start, it may break the cluster in case of new cluster start/full cluster restart/half cluster restart/etc since it would be N^2 again. |
Persisting an error on |
Nonetheless, this solution is broken even without restarts -- errors reset on each |
We have several same issues in Cartridge. I propose just adding a note that the issue stays until restart. |
Maybe we could add some "clear cached issues" button in Cartridge? Users can check the actual status of migrations with the migrations tab, can't they? |
Expose last operation error to Cartridge issues.
(Issues are also exposed to default Grafana dashboard, as well as default alerts.)
(Error message could be improved, but it's always has been like this: I haven't changed anything here in this patch.)
The original issue was about exposing migrations inconsistency from new
migrations
tab to Cartridge issues as well. But using straightforward approach is rather bad: checking inconsistency is a full cluster map-reduce operation, and, if exposed toget_issues
, it will omit N^2 network requests since issues are collected from each instance, there is no way to check whether migrations are consistent without cluster map-reduce and there is no distinct migrator provider -- any instance is migration provider. And, sinceget_issues
may trigger rather often, having such a feature may make cluster unhealthy (we already had similar things with metrics [1]). Last error is reset on each operation call.tnt_cartridge_issues
gather only local issues metrics#243Closes #73