-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
two-byte characters in char and varchar are truncated to one byte #1294
Comments
👋 @Ceshion Hey there!
Well, yeah, because it's not doing what it's supposed to. 😬 Your solution "works", but it basically puts the task of converting from multibyte characters to single byte characters on the server, instead of handling this at the client level. If a user wants this conversion to happen on the database level, they should be using SQL Servers Based on the discussion over in #723, I think the proper fix would be to:
What do you think? Does that make sense to you? 🙇♂️ |
Hi @arthurschreiber! I appreciate you bearing with me here, I have only just been learning most of the relevant info about encodings in the past few days- trying my best to keep up though! Yes, I see the error in my explanation and understanding- where we can't store the entire BMP on one column and characters are single-byte, just on a specific codepage- thank you for pointing that out 😁 🤔 My initial thought had been that because the server already has the information it needs in order to convert whichever multibyte characters it can for a given column (based on collation, which it knows about), it would take less negotiation to just allow it to convert what is technically a unicode parameter to whatever it should be--where a client would need to get that info from the server anyway (or else know specific details about the server ahead of time), wouldn't it? Not to say of course that it wouldn't technically be right, but I had interpreted the decision to use nchar/nvarchar by default in ADO and JDBC to be based around that logic, and it seems reasonable to me. What do you think? Should we still put the responsibility of interpreting the correct codepage for a column (in a specific table and database) on the client? |
That is a valid decision to make, and it's something that I think the application that uses This probably requires better documentation, something along the lines of "if you just want to send JavaScript string values to the database, use On another note, I still think we should fix the |
Oh I agree, allowing a consumer to specify a codepage for tedious/src/data-types/varchar.ts Lines 102 to 119 in ceb73d3
So I can take the idea of using unicode parameters instead higher up the chain (again 😅) and work on adding encoding options. |
@Ceshion have you tried the latest Can this issue be closed? |
Duplicate of #723
Char and varchar are currently configured to read the values passed to them purely as ascii, but char and varchar columns in MSQL can support the entire BMP (i.e. one- and two-byte UTF-16 characters). On read this is handled by decoding records with iconv, but the current state is that storing a two-byte character such as
"\u2021"
(‡) in a char or varchar column will truncate that character to only the second byte, resulting in an incorrect character being stored, such as"\u0021"
(!).We could resolve this by encoding char and varchar parameters as
"ucs2"
with nchar and nvarchar type IDs when serializing them into the RPC stream, as in the attached PR and similar to the approach used by ADO and JDBC per the linked issue. Is there any reason not to do this?Only changing the encoding does not work, since it seems like downstream from tedious the bytes are read as separate characters.
The text was updated successfully, but these errors were encountered: