Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VarChar encoding problem. #723

Open
heroboy opened this issue Apr 2, 2018 · 16 comments
Open

VarChar encoding problem. #723

heroboy opened this issue Apr 2, 2018 · 16 comments

Comments

@heroboy
Copy link
Contributor

heroboy commented Apr 2, 2018

see this:

writeParameterData: function(buffer, parameter) {
if (parameter.value != null) {
if (parameter.length <= this.maximumLength) {
buffer.writeUsVarbyte(parameter.value, 'ascii');
} else {
buffer.writePLPBody(parameter.value, 'ascii');
}
} else {
if (parameter.length <= this.maximumLength) {
buffer.writeUInt16LE(NULL);
} else {
buffer.writeUInt32LE(0xFFFFFFFF);
buffer.writeUInt32LE(0xFFFFFFFF);
}

You use the ascii encoding, is it incorrect?
I see you consider the encoding in value-parser.js.
So I find that the input parameter encoding is not correct, the select result is correct.

@heroboy
Copy link
Contributor Author

heroboy commented Apr 2, 2018

maybe relate to #113 , but it should use iconv-lite to encode string.

@Hadis-Knj
Copy link

Hadis-Knj commented Apr 2, 2018

@heroboy varchar stores 1 byte (ascii) characters, there's an equivalent function in nvarchar datatype object to handle the Unicode characters

writeParameterData: function(buffer, parameter) {
if (parameter.value != null) {
if (parameter.length <= this.maximumLength) {
buffer.writeUsVarbyte(parameter.value, 'ucs2');
} else {
buffer.writePLPBody(parameter.value, 'ucs2');
}
} else {

If you have faced an issue would you please send a repo?

@heroboy
Copy link
Contributor Author

heroboy commented Apr 3, 2018

hi @Hadis-Fard,
Please try this:

db.acquire((err, conn) =>
{
	if (err) return;

	var request = new tedious.Request("select '中文',@v1,@v2,@v3; set @v4 = '中文';", function (err)
	{
		if (err)
		{
			console.log(err);
			return;
		}
		conn.release();
	});
	const iconv = require('iconv-lite');
	request.on('row', cols =>
		console.log(cols.map(x => x.value))
	);
	request.on('returnValue', (name, value) =>
	{
		console.log(name, value);
	})
	console.log(iconv.encode('中文', 'cp936'));
	request.addParameter('v1', tedious.TYPES.VarChar, '中文')
	request.addParameter('v2', tedious.TYPES.NVarChar, '中文')
	request.addParameter('v3', tedious.TYPES.VarChar, iconv.encode('中文', 'cp936'),{length:128})
	request.addOutputParameter('v4', tedious.TYPES.VarChar, null);
	conn.execSql(request);
})

The output is

[ '中文', '-', '中文', '����' ] //shoud be [ '中文', '中文', '中文', '中文' ], the varchar values are incorrect.
v4 中文

I tried use buffer as the parameter value,but it didn't work. Becuase in VarChar.prototype.validate you convert it to string if it is not string. I changed it to

validate: function validate(value) {
    if (value == null) {
      return null;
    }
    if (typeof value !== 'string' && !Buffer.isBuffer(value)) { // I added this isBuffer
      if (typeof value.toString !== 'function') {
        return TypeError('Invalid string.');
      }
      value = value.toString();
    }
    return value;
  }

And the output would be

[ '中文', '-', '中文', '中文' ]
v4 中文

The v3 become correct

@heroboy
Copy link
Contributor Author

heroboy commented Apr 3, 2018

And I don't understand what you mean about varchar stores 1 byte (ascii) characters. As I know varchar is a byte string, but I am not sure the encoding is depend on SQLServer's settings or client OS settings or something else. Maybe I guess SQLServer's codepage setting is used to convert varchar to nvarchar, otherwise the varchar is always a byte string, what it is is explained by the client side.
Sorry my English is not good. Hope you undenstand what I was saying.

@Hadis-Knj
Copy link

as it is mentioned in MSDN varchar/char is non-Unicode string data, and nvarchar/nchar is unicode string data. so when you have a Unicode string data you should use nvarchar.

Also we want to validate the char parameters to be string or if it is possible we convert it to string
for the case of ' iconv.encode('中文', 'cp936')' you should use VarBinary type, as the iconv returns buffer.

@heroboy
Copy link
Contributor Author

heroboy commented Apr 4, 2018

If a table column type is varchar, it can correctly store '中文'( Do you think it is an unicode string data that can't be stored?)。For example, the sql is select * from table where some_varchar_column = @value,what the @value should be? In other engine like ado, I can always use varchar correctly. Of cause, in tedious I can use nvarchar, but it will be an implicity type cast in SQLServer.

Another example, set @value = '中文',if the @value is varchar。In tedious, I can correctly get '中文' in returnValue event. This is what I mean "the input parameter encoding is not correct". And prove that when I have a Unicode string data I can use varchar.

I see in varchar.js the function resolveLength think value can be a buffer. writeParameterData calls writable-tracking-buffer.write, it also think value can be a buffer. So I think it is a bug that validate doesn't allow value to be a buffer.

@heroboy
Copy link
Contributor Author

heroboy commented Apr 4, 2018

What I do iconv.encode('中文', 'cp936') is bacuse varchar.js doesn't correctly encode the string,I manualy do it. Not becuase it is a varbinary. So please understand Unicode, encoding, string, byte, char. You English speaking people seldom suffer from it.

@heroboy
Copy link
Contributor Author

heroboy commented Apr 4, 2018

So please at least make the parameter value can be a buffer. It's simple to fix.

@arthurschreiber
Copy link
Collaborator

@heroboy 👋

Sorry you're running into this issue with tedious. 🙇Most people that use tedious use an ascii compatible encoding when using varchar columns, and that's the reason why this issue did not come up earlier.

You're right, varchar columns can store data in other encodings, like cp936, and tedious already supports this correctly when reading data out of a table, where it uses iconv-lite to convert from the table's encoding back into the JS native encoding.

Unfortunately, query parameters do not support specifying an encoding option, but it should be possible to add this. I'm against allowing Buffers to be passed in for string types like varchar or nvarchar - the current type handling is already a mess and confusing in many cases, adding more special exceptions like this will just make it harder to clean this up in the future. 😞

I think the correct way to support this is to store the SQL Collation that is send to the client via the ENVCHANGE token, and use that collation automaticall for all non-unicode columns (char, varchar, text).

@heroboy @Hadis-Fard What do you think?

@arthurschreiber
Copy link
Collaborator

Actually, I just saw that we have an internal sqlCollationChange event, that is not used anywhere. Wiring this all up correctly might be a bit tricky, but I believe this will be the "correct" solution for the encoding/collation issue here.

@heroboy
Copy link
Contributor Author

heroboy commented Apr 4, 2018

@arthurschreiber
Yes I think use the sql server 's collation setting is correct. When I read tedious source code, I find there is a codepage parameter when parsing value, so the output parameter is correct. But I can't find codepage like setting that sql server send to you when connecting to the sqlserver. I am familiar with tedious protocol.

When writing code, we should always keep in mind that when converting between string and binary buffer, we should always specify an encoding. Using Buffer#toString() or ascii is temporary, not a correct solution.

If you can read SqlServer's encoding setting. https://stackoverflow.com/questions/5182164/sql-server-default-character-encoding It is best and correct. If just adding an encoding in tedious option(or request.addParameter option), I think it is good enough, it makes teidous more flexible that user can store any encoding in varchar columns(though it is incorrect when sorting the column).

Thanks.

@Suraiya-Hameed
Copy link
Member

Using collation received in ENVCHANGE wouldn't be an ideal solution, as it's possible to have different collation for column, database and SQL Server 🤔

@Suraiya-Hameed
Copy link
Member

Suraiya-Hameed commented May 19, 2018

Both ADO and JDBC driver are sending string as NVarchar by default. Here is the TSQL sent by the drivers (captured via SQL Profiler)

-- JDBC
exec sp_executesql N'INSERT INTO c936_table (c1) VALUES(@P0)',N'@P0 nvarchar(4000)',N' 中文'

--ADO
exec sp_executesql N'INSERT INTO [dbo].[c936_table] ([c1]) VALUES (@c1);',N'@c1 nvarchar(11)',@c1=N'中文'

--tedious
exec sp_executesql @statement=N'insert into c936_table values (@c1);',@params=N'@c1 varchar(2)',@c1='-‡'

I was sending value '中文' as string/varchar in all 3 cases, and this was the result. Passing string as NVarchar by default seems to be a better alternative than using database collation on all of the char/varchar columns. Using NVarchar too comes at a cost, but considering that collation can be set at any level (expression, column, database or Server) its better to have a solution to handle them all or none at all.

@seandongjie
Copy link

@v-suhame
You are right to me. Thanks.

@arthurschreiber
Copy link
Collaborator

I'm reopening because this is still an issue - see the discussion in #1294.

@heroboy You posted some code to reproduce this behaviour over in #723 (comment). Do I need to tweak any additional server or database settings (e.g. change the server collation) to reproduce this correctly? 🙇‍♂️

@heroboy
Copy link
Contributor Author

heroboy commented Jul 22, 2021

@arthurschreiber
No. I think what ever collation sql server is, it will produce incorrect output.
But if you modify the validate function in my sample code, and want to make this work:

request.addParameter('v3', tedious.TYPES.VarChar, iconv.encode('中文', 'cp936'),{length:128})

You should set sql server to match the cp936 encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants