Skip to content

invalid byte sequence in UTF 8 while pushing pulling db

Alessandro Fazzi edited this page Mar 24, 2018 · 16 revisions

Why?

As you know, each varchar/text database column has a charset (encoding) and a collation. For example you can choose utf8 for one column and latin1 for another.

However mysqldump uses only UTF-8 as default character set when dumping your database. Because of this your dump file could have invalid UTF-8 chars.

When wordmove tries to open this dump file, it raises invalid byte sequence in UTF-8. wordmove cannot guess what's your dump encoding and, anyway, it will never try to convert your data in UTF-8 automatically: doing that will delete all UTF-8 invalid characters.

What should I do?

Use --hex-blob mysqldump's flag (recommended)

Thanks @360Zen (ref)

Anyway, adding the --hex-blob flag to mysqldump_options seems to perfectly address the issue. It converts all binary columns to hex code, which is conveniently UTF-8-friendly, thereby avoiding any issues with the adapter. I know that this flag has the potential to create larger dump files, but given that it fixes a known Wordmove bug/issue, I wonder if it would be worth including as part of the default dump settings. If nothing else, I would suggest adding mysqldump_options: --hex-blob as a possible workaround on the wiki page I referenced above.

Using SqlAdapter::Default

You should use UTF-8 for all of your varchar/text database columns, taking care of invalid characters (if any). Please, search on stackoverflow. You are not alone :)

We had scenarios were the problem was inside specific 3rd party tables created by plugins; if it fits your needs, you can ignore those tables while dumping (refer to the wiki for more documentation about how to do that).

Using SqlAdapter::Wpcli

You can try the new SqlAdapter introduced since version 2.1.0. Here are docs. This could fix this problem in certain situations.

I have a better idea

If you think this problem could be managed automagically in wordmove without losing data, please submit a PR!

Example error message

    /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/sql_adapter.rb:44:in `gsub!': invalid byte sequence in US-ASCII (ArgumentError)
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/sql_adapter.rb:44:in `serialized_replace!'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/sql_adapter.rb:36:in `replace_field!'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/sql_adapter.rb:25:in `replace_vhost!'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/sql_adapter.rb:17:in `adapt!'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/deployer/base.rb:168:in `adapt_sql'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/deployer/ssh.rb:39:in `pull_db'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/cli.rb:69:in `block in pull'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/cli.rb:37:in `block in handle_options'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/cli.rb:36:in `each'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/cli.rb:36:in `handle_options'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/lib/wordmove/cli.rb:68:in `pull'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/thor-0.19.1/lib/thor/command.rb:27:in `run'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/thor-0.19.1/lib/thor/invocation.rb:126:in `invoke_command'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/thor-0.19.1/lib/thor.rb:359:in `dispatch'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/thor-0.19.1/lib/thor/base.rb:440:in `start'
	from /usr/local/rvm/gems/ruby-2.3.0/gems/wordmove-2.0.0/exe/wordmove:6:in `<top (required)>'
	from /usr/local/rvm/gems/ruby-2.3.0/bin/wordmove:23:in `load'
	from /usr/local/rvm/gems/ruby-2.3.0/bin/wordmove:23:in `<main>'
	from /usr/local/rvm/gems/ruby-2.3.0/bin/ruby_executable_hooks:15:in `eval'
	from /usr/local/rvm/gems/ruby-2.3.0/bin/ruby_executable_hooks:15:in `<main>' 

Solutions / suggestions from the comunity

Add your own here :)

@360Zen

Suggested the above recommended solution 🎉

@anhdq1801

  1. Modified/created a `~/.profile'
  2. added these two lines to .profile:
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
  1. source your ~/.profile

source ~/.profile

@byron222

Add

sql_content.encode!('UTF-16', 'UTF-8', :invalid => :replace, :replace => '')
sql_content.encode!('UTF-8', 'UTF-16')
sql_content.force_encoding("UTF-8")

to this File,above the line 45

/root/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/wordmove-2.1.2/lib/wordmove/sql_adapter/default.rb