-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-ascii characters assigned to var/char columns in SQL are truncated to one byte #7932
Comments
convert typeorm char and varchar parameters to n' ' in order to allow saving non-ascii characters Closes: typeorm#7932
This isn't a TypeORM issue, is it? We don't mangle anything here. Wouldn't this be a problem for the database itself or for the mssql node driver? |
as mentioned here. I also don't understand why you are trying to store unicode characters on something that doesn't support unicode. It doesn't make sense to do that and bring monkey patches to it. Unless I didn't get something. |
In theory I totally agree that converting like this is something the database driver (or tedious itself, really) should be handling, however per the linked tedious issue, solving it there seems prohibitively difficult. Following this comment, it seems that just using nvar/char parameters by default is a common solution, and as typeorm is the originator of the parameters to use, it seems appropriate to change their type here if anywhere, right? Edit: thinking it over again, it may make more sense to handle a conversion like this in the driver- where it is invisible to the consumer i.e. typeorm, but I would like if we could still consider this solution in case another one couldn't be reached or agreed upon there. |
Before all this, that's what I thought too! However, I found that empirically, MS-SQL absolutely can store 2-byte characters in char and varchar columns, or at least it can as of SQL Server 2017. I have not tested backward unless the test runner for typeorm does. Given that is the case though, I think we by some means need to get whatever the end consumer wants to store to the database intact, and let it decide if it can or cannot do it. In this case and to that end, storing something that cannot be represented by the codepage specified by the database's collation settings will cause an error, correctly. |
All that said, I would be willing to write a revision for tedious and/or mssql as well- if it can be resolved there, to the satisfaction of the maintainers of those projects, it would of course make this irrelevant. |
I opened this issue for tedious (with a PR) to attempt to address the same issue at a lower level. |
…ns retains data add tests to ensure that using nchar and nvarchar parameters will correctly store data, and that the actual underlying data type has not changed. Validates typeorm#7932 test: verify that storing strings with characters that are too long for var/char throws ensure that we will fail loudly when an attempt is made to store a string that cannot be converted to the underlying type because it has too-long characters Validates typeorm#7932
…ns retains data add tests to ensure that using nchar and nvarchar parameters will correctly store data, and that the actual underlying data type has not changed. test: verify that storing strings with characters that are too long for var/char throws ensure that we will fail loudly when an attempt is made to store a string that cannot be converted to the underlying type because it has too-long characters Validates typeorm#7932
…ns retains data add tests to ensure that using nchar and nvarchar parameters will correctly store data, and that the actual underlying data type has not changed. test: verify that storing strings with characters that are too wide for var/char throws ensure that we will fail loudly when an attempt is made to store a string that cannot be converted to the underlying type because it has characters that are not in the codepage used by the database/table/column Validates typeorm#7932
…de equivalent this allows the server to handle mapping extended ASCII characters into the correct codepage for a given column without needing to pre-check collation settings Fixes typeorm#7932
…de equivalent this allows the server to handle mapping extended ASCII characters into the correct codepage for a given column without needing to pre-check collation settings Fixes typeorm#7932
@pleerock per discussion on tedious #1294, it appears to still be up to consumers (such as typeorm) to specify how to encode any non-ASCII characters. Today, the only option we have for that is to send parameters as unicode types regardless of the column type. Encoding strings in the consuming code and passing them as a buffer into tedious appears to be an intended use, but is currently not possible, since tedious' validation for varchar and char calls As it is, I think it seems more efficient to handle this as a consumer by just using unicode parameters by default, since then there is no need to preprocess the data according to the specific configuration of a database- query for column/table/database collation, store it, use it to encode incoming data according to its intended destination. . . I think it is pretty apparent that gets messy fast. By using unicode parameters to begin with, typeorm can avoid needing to be concerned with all of these specifics about a database and queries to run in order to store data correctly, without having to rely on a similar conversion to unicode parameters happening on a lower level- we can just be explicit about it. On the other hand, if we leave this up to consumers of typeorm to handle, then they have to make a decision between benefitting from the lower storage requirement of (var)char but not being able to store half of the characters they should be able to in any case, or accepting the larger storage requirement of n(var)char in order to even just store strings that fit in extended (single-byte) ASCII. Or otherwise using a patch like, for example, creating a column as varchar in the database and marking it as nvarchar in its column decorator so that parameters with values to be stored in that column will be nvarchar- which does work, but is by far hackier than just automatically changing only the parameter type. Could we consider using this solution after all? |
Issue Description
We are using MS-SQL in typeorm, and our data model includes some
varchar
columns. When we are inserting or updating records using strings that contain non-ASCII characters, those characters are stored without any errors, but using only their second byte. For example, if we store"\u2021"
(‡), the value actually stored to the database will be equal to"\u0021"
(!). Note also that the object returned from e.g.save
shows the original value,"\u2021"
(‡). The collation setting for the database, tables, and columns are allSQL_Latin1_General_CP1_CI_AS
, and the characters we are attempting to store can therefore be represented by the corresponding codepage.Expected Behavior
If a value can be stored correctly in a given var/char column in MS-SQL based on the relevant collation settings, we would expect that it would be stored unchanged. If it cannot, we would expect an error to be thrown when trying to store it. It seems that determining whether a value can be stored or not should be the responsibility of the database server, and we could just forward any error from it to the end consumer.
Actual Behavior
Attempting to store any value that contains characters that do not fit into one byte in MS-SQL results in each character being stored as if only the second byte were set.
Steps to Reproduce
mssql
driver, and an ormconfig with atype: 'mssql'
connection which has an entity type that has at least onechar
orvarchar
column./index.ts
./entity/OneModel.ts
My Environment
mssql
driver versionAdditional Context
I have tracked down the point where truncation occurs to be within tedious'
data-types/varchar
module, in which varchar values are converted to bytes to be written into the SQL RPC request stream usingBuffer.from(value, 'ascii')
. Based on this issue it seems that the onus is on consumers to usenvarchar
parameters when appropriate as a solution to this issue, even when setting avarchar
column.Relevant Database Driver(s)
aurora-data-api
aurora-data-api-pg
better-sqlite3
cockroachdb
cordova
expo
mongodb
mysql
nativescript
oracle
postgres
react-native
sap
sqlite
sqlite-abstract
sqljs
sqlserver
Are you willing to resolve this issue by submitting a Pull Request?
The text was updated successfully, but these errors were encountered: