Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL default encoding #813

Closed
alexkazik opened this issue Jun 1, 2018 · 7 comments
Closed

MySQL default encoding #813

alexkazik opened this issue Jun 1, 2018 · 7 comments

Comments

@alexkazik
Copy link

The default encoding persistent choses for all text-like columns is utf8.
The problem is that MySQL's utf8 is only capable to store the BMP characters (i.e. up to three byte UTF-8 characters).

There is a new character encoding called utf8mb4 which is what the rest of the world calls UTF-8.

It was added in MySQL 5.5.3 (2010-03-24).

(Also sometimes later utf8 was renamed to utf8mb3 to represent it better, and utf8 is kept as an alias for it.)

A conversion from utf8 to utf8mb4 can always be done.

I'd like to either have an option to select the default or simply changing it (MySQL.hs:773,775).
Currently I have to write all sql types by hand to accomplish that.

That the connection is also utf8 be default does not really help either, but that setting can be modified ("mysql" Database.MySQL.Base.defaultConnectInfo).

@paul-rouse
Copy link
Contributor

I agree that the present default is very unfortunate, but I don't think we can simply change it, because migrating an existing database from utf8mb3 to utf8mb4 is not totally straightforward: see the MySQL Reference Manual. An optional setting would be fine - would you like to try a PR?

@alexkazik
Copy link
Author

I'd prefer a global option, but I don't see a chance there. I could write createMySQLPoolWithDefaultEncoding and withMySQLConnWithDefaultEncoding which allow you to specify the wanted encoding, and make createMySQLPool an alias for createMySQLPoolWithDefaultEncoding"utf8". What do you think?

@paul-rouse
Copy link
Contributor

Yes, I think that is how you would have to specify it. I did wonder about taking the default from the CharsetName option in the ConnectInfo (if it is there at all, falling back to "utf8" if not), but it feels to me as if that would be confusing the connection charset with the default column type.

BTW feel free to send in a PR to https://github.com/paul-rouse/mysql to add defaultConnectInfoMB4 if you like - the spelling there is meant to be consistent with the mysql-haskell package.

alexkazik pushed a commit to alexkazik/persistent that referenced this issue Jun 13, 2018
@alexkazik
Copy link
Author

I've created a first version: alexkazik@1d389a1

  • Is MB4 enough and should *Enc not be added?
  • I've added newtype for the default encoding, just to be sure that the type fits, should it be removed?

MockMigration still uses utf8, but since that code is not executed it should be fine.

Haven't tried it in a real project though. The Haskell library mysql always segfauts, I'm using mysql-haskell and persistent-mysql-haskell. I want to add this changes there also.

@naushadh
Copy link
Contributor

naushadh commented Jul 6, 2018

Duplicate of #679

alexkazik pushed a commit to alexkazik/persistent that referenced this issue Jul 7, 2018
@alexkazik
Copy link
Author

I've created a second version, based upon a discussion on persistent-mysql-haskell.

If the ConnectInfo states that the charset is utf8mb4 then it's also used for the default text types, otherwise still utf8.

alexkazik@81850ef

Opinions on either this or the previous idea?

@parsonsmatt
Copy link
Collaborator

Merged in #980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants