Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vttablet_restore_done hook #8007

Merged
merged 2 commits into from
May 14, 2021

Conversation

ajm188
Copy link
Contributor

@ajm188 ajm188 commented Apr 30, 2021

Description

This PR adds a new hook called vttablet_restore_done, which fires, in a background goroutine, whenever a tablet has finished restoring. In the case where a restore failed, we include the error string in the hook env.

We also now track start/stop time of restoreDataLocked, and provide both of those as utc RFC3339 timestamps in the env, as well as the standard Go-formatted duration of stop - start.

Finally, I also updated the default tabletmanager.hookExtraEnv to include the tablet's keyspace and shard.

Local example

No hook
$ cd "$VTROOT"
$ ls | grep vttablet_restore_done
$ cd examples/local && ./101_initial_cluster.sh >/dev/null && cd -

Snippet of tablet logs:

I0429 22:51:33.071145   30140 metadata_tables.go:72] Populating _vt.local_metadata table...
I0429 22:51:33.624373   30140 tm_state.go:172] Changing Tablet Type: RDONLY
I0429 22:51:33.624401   30140 tm_state.go:324] Publishing state: alias:<cell:"zone1" uid:102 > hostname:"SFO-M-AMASON02" port_map:<key:"grpc" value:
16102 > port_map:<key:"vt" value:15102 > keyspace:"commerce" shard:"0" type:RDONLY mysql_hostname:"SFO-M-AMASON02" mysql_port:17102 
I0429 22:51:33.625584   30140 shard_sync.go:70] Change to tablet state
I0429 22:51:33.625677   30140 restore.go:103] No vttablet_restore_done hook.
I0429 22:51:33.659298   30140 replmanager.go:79] Replication Manager: stopped
I0429 22:51:33.659318   30140 updatestreamctl.go:228] Enabling update stream, dbname: vt_commerce
I0429 22:51:33.659352   30140 state_manager.go:212] Starting transition to RDONLY Serving, timestamp: 0001-01-01 00:00:00 +0000 UTC
trivial hook
$ chmod +x vthook/vttablet_restore_done
$ cat vthook/vttablet_restore_done
#!/bin/bash

env | grep -E 'KEYSPACE|SHARD|TABLET_ALIAS|(^TM_.*)'
$ cd examples/local media && ./101_initial_cluster.sh >/dev/null && cd -

Snippet of tablet logs:

I0430 08:37:41.278019   62066 hook.go:125] hook: executing hook: /Users/amason/work/vitess/vthook/vttablet_restore_done 
I0430 08:37:41.278899   62066 replmanager.go:79] Replication Manager: stopped
I0430 08:37:41.279001   62066 updatestreamctl.go:228] Enabling update stream, dbname: vt_commerce
I0430 08:37:41.279080   62066 state_manager.go:212] Starting transition to RDONLY Serving, timestamp: 0001-01-01 00:00:00 +0000 UTC
E0430 08:37:41.283444   62066 state_manager.go:276] Error transitioning to the desired state: RDONLY, Serving, will keep retrying: Unknown database 
'vt_commerce' (errno 1049) (sqlstate 42000)
I0430 08:37:41.283488   62066 state_manager.go:661] State: exiting lameduck
E0430 08:37:41.283493   62066 tm_state.go:277] Cannot start query service: Unknown database 'vt_commerce' (errno 1049) (sqlstate 42000)
I0430 08:37:41.283541   62066 tm_state.go:324] Publishing state: alias:<cell:"zone1" uid:102 > hostname:"some-host" port_map:<key:"grpc" value:
16102 > port_map:<key:"vt" value:15102 > keyspace:"commerce" shard:"0" type:RDONLY mysql_hostname:"some-host" mysql_port:17102 
I0430 08:37:41.284409   62066 query.go:81] exec STOP SLAVE
I0430 08:37:41.284542   62066 query.go:81] exec RESET SLAVE ALL
I0430 08:37:41.285099   62066 query.go:81] exec RESET MASTER
I0430 08:37:41.285282   62066 hook.go:162] hook: result is result: HOOK_SUCCESS
stdout:
TM_RESTORE_DATA_START_TS=2021-04-30T12:37:40Z
KEYSPACE=commerce
TABLET_ALIAS=zone1-0000000102
SHARD=0
TM_RESTORE_DATA_DURATION=767.788809ms
TM_RESTORE_DATA_STOP_TS=2021-04-30T12:37:41Z

Related Issue(s)

Checklist

  • Tests were added or are not required N/A
  • Documentation was added or is not required

Deployment Notes

…ished restoring

We fire the hook in either success or error cases, and in the case of
error, include the error string in the hook env, so the hook can access
it.

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
@ajm188 ajm188 requested a review from deepthi April 30, 2021 12:56
@deepthi
Copy link
Member

deepthi commented May 11, 2021

We need an issue that documents the use case for this feature and what the missing functionality is. Perhaps this information is either (a) not available or (b) only logged and a pain to extract from logs?

@ajm188
Copy link
Contributor Author

ajm188 commented May 12, 2021

Shoot, I thought I linked #8006, sorry!

Perhaps this information is either (a) not available or (b) only logged and a pain to extract from logs?

Definitely (b), I see a log line of:

xtrabackupengine.go:157] Starting backup with $n stripe(s), as well as a log line from StartBackup a little further up


Edit:

Shoot, I thought I linked #8006, sorry!

Turns out I'm bad at hand-editing <details><summary></summary></details> and missed a closing tag 🤦

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deepthi deepthi merged commit 0d9cc93 into vitessio:master May 14, 2021
@askdba askdba added Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add hook for vttablet restore phase finishing
3 participants