Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: read and publish gh-ost FATAL message where possible #8192

Merged

Conversation

shlomi-noach
Copy link
Contributor

Description

When running gh-ost migrations, and gh-ost fails, gh-ost produces a detailed error log. But the vttablet that executes gh-ost only tells something like failed running command: ... ... error=exit status 1.

When the user consequently runs SHOW VITESS_MIGRATIONS LIKE '...' they see aforementioned generic error in the message column:

     migration_uuid: 5ca3b482_be0f_11eb_9cd1_0a43f95f28a3
           keyspace: commerce
              shard: 0
       mysql_schema: vt_commerce
        mysql_table: nopk
migration_statement: alter table nopk add primary key (id)
           strategy: gh-ost
            options: -skip-topo
    added_timestamp: 2021-05-26 10:44:35
requested_timestamp: 2021-05-26 10:44:35
    ready_timestamp: 2021-05-26 10:44:36
  started_timestamp: 2021-05-26 10:44:36
 liveness_timestamp: NULL
completed_timestamp: 2021-05-26 10:44:36
  cleanup_timestamp: NULL
   migration_status: failed
           log_path: ip-172-31-45-90:/tmp/online-ddl-5ca3b482_be0f_11eb_9cd1_0a43f95f28a3-409940722
          artifacts: _5ca3b482_be0f_11eb_9cd1_0a43f95f28a3_20210526104436_gho,_5ca3b482_be0f_11eb_9cd1_0a43f95f28a3_20210526104436_ghc,_5ca3b482_be0f_11eb_9cd1_0a43f95f28a3_20210526104436_del,
            retries: 0
             tablet: zone1-0000000100
     tablet_failure: 0
           progress: 0
  migration_context: vtgate:74e5e7d6-be0b-11eb-9cd1-0a43f95f28a3
         ddl_action: alter
            message: failed running command: ... ... ; error=exit status 1

This PR provides more useful information in in message column.

When gh-ost fails, it typically fails with a FATAL log message. This PR extract that FATAL log message, if possible. If successful, it replaces the generic error with the more informative message from the log. Examples:

*************************** 17. row ***************************
                 id: 17
     migration_uuid: 89ac1fd5_be13_11eb_8862_0a43f95f28a3
           keyspace: commerce
              shard: 0
       mysql_schema: vt_commerce
        mysql_table: corder
migration_statement: alter table corder drop column no_such_column
           strategy: gh-ost
            options: -skip-topo
...
         ddl_action: alter
            message: FATAL Error 1091: Can't DROP 'no_such_column'; check that column/key exists
        eta_seconds: -9223372036854775808

                 id: 18
     migration_uuid: 08bb121d_be14_11eb_8ce7_0a43f95f28a3
           keyspace: commerce
              shard: 0
       mysql_schema: vt_commerce
        mysql_table: nopk
migration_statement: alter table nopk add primary key (id)
           strategy: gh-ost
            options: -skip-topo
...
         ddl_action: alter
            message: FATAL No PRIMARY nor UNIQUE key found in table! Bailing out
        eta_seconds: -9223372036854775808

Related Issue(s)

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving labels May 26, 2021
@shlomi-noach shlomi-noach requested a review from a team May 26, 2021 12:01
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good.
Can one of the existing tests be enhanced to expect an error message?

@shlomi-noach
Copy link
Contributor Author

Can one of the existing tests be enhanced to expect an error message?

You're absolutely right, yes, I'll add a test

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Test added.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

Re-request for review 🙏

@shlomi-noach shlomi-noach requested a review from a team June 3, 2021 12:29
Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're into that ParseAndBind lifestyle now. 👍👍

@shlomi-noach
Copy link
Contributor Author

I see you're into that ParseAndBind lifestyle now. +1+1

Heh, I actually created ParseAndBind...!

@shlomi-noach shlomi-noach merged commit e018d0f into vitessio:main Jun 3, 2021
@shlomi-noach shlomi-noach deleted the online-ddl-ghost-fatal-message branch June 3, 2021 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants