-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for new character sets #1
Comments
From the FreeTDS localization page: "To learn what character set the client wants, FreeTDS prefers the applicable freetds.conf client charset property. If that is not set, it parses the LANG environment variable. In either case, the found string is passed to iconv(3) (or its built-in replacement). [1]. If neither is found, UCS-2 data are converted to ISO 8859-1." |
One user said that setting "client charset" in freetds.conf to UTF-8 worked. |
Today I got a foreign table returning error related to charset below:
Tried put configuration Does not message pointing that itself is just an warning? |
Hi @sumariva, I believe that Can you please show the full contents of your freetds.conf? |
Yes, this is my freetds.conf
|
Thanks @sumariva! The definition of Can you show me the definition of your foreign server and foreign table please? |
Relative to postgresql conversions, there are iconv like functions on strings: |
The character set conversions are done by FreeTDS using the iconv library before PostgreSQL sees the strings, so those functions shouldn't be necessary for this. |
The configured server CREATE SERVER siarm
FOREIGN DATA WRAPPER tds_fdw
OPTIONS (servername '192.168.2.30',port '1433'); The table mapping system_information tables, here I also noted a weird caracter ASCII(2) at end of a column CREATE FOREIGN TABLE sgm.information_schema_columns_tds (
table_catalog varchar
, table_schema varchar
, table_name varchar
, column_name varchar
, ordinal_position integer
, column_default varchar
, is_nullable varchar
, data_type varchar
, character_maximum_length integer
, character_octect_length integer
, numeric_precision integer
, numeric_precision_radix integer
, numeric_scale integer
, datetime_precision integer
, interval_type varchar
, interval_precision integer
, character_set_catalog varchar
, character_set_schema varchar
, character_set_name varchar
, collation_catalog varchar
, collation_schema varchar
, collation_name varchar
, domain_catalog varchar
, domain_schema varchar
, domain_name varchar
, udt_catalog varchar
, udt_schema varchar
, udt_name varchar
, scope_catalog varchar
, scope_schema varchar
, scope_name varchar
, maximun_cardinality integer
, dtd_identifier varchar
/*
, is_self_referencing varchar
, is_identity varchar
, identity_generation varchar
, identity_start varchar
, identity_increment varchar
, identity_maximum varchar
, identity_minimum varchar
, identity_cycle varchar
, is_generated varchar
, generation_expression varchar
, is_updatable varchar */
)
SERVER siarm
OPTIONS ( query 'SELECT * FROM information_schema.columns ORDER BY table_schema, table_name, ordinal_position' ) The table as harvested using the system information table CREATE FOREIGN TABLE cliente."MV_IPTU_Contribuintes_CadastroHistoricos"
(
"Inscricao" integer NOT NULL,
"ID" smallint NOT NULL,
"Processo" character varying(12) ,
"Historico" text NOT NULL,
"Data" timestamp without time zone ,
"Departamento" smallint ,
"Data_Atualizacao" timestamp without time zone
)
SERVER siarm
OPTIONS (table 'MV_IPTU_Contribuintes_CadastroHistoricos' ); |
Your foreign server isn't using the Otherwise, you should set |
Humm, updated the freetds.conf as suggested [global]
# TDS protocol version
tds version = 7.0
client charset = UTF-8
[duquedecaxias]
host = 192.168.2.30
port = 1433
tds version = 7.0
client charset = UTF-8 but still with same message. SHOW client_encoding
|
Did you restart the PostgreSQL server after making the change to freetds.conf? |
Yes, the postgresql server has been restarted after each change since I do not known when those parameters are reread by the fdw_tds driver. |
Maybe your configuration is being read correctly, but one of your rows has non-UTF-8 data.
What do you mean by this? Are you saying that |
When I executed a select statement on that system information table(this is a database internal table) I got the following result in psql table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length | character_octect_length | numeric_precision | numeric_precision_radix | numeric_scale | datetime_precision | interval_type | interval_precision | character_set_catalog | character_set_schema | character_set_name | collation_catalog | collation_schema | collation_name | domain_catalog | domain_schema | domain_name | udt_catalog | udt_schema | udt_name | scope_catalog | scope_schema | scope_name | maximun_cardinality | dtd_identifier
---------------+--------------+--------------------------------+-------------+------------------+----------------+-------------+-----------+--------------------------+-------------------------+-------------------+-------------------------+---------------+--------------------+---------------+--------------------+-----------------------+----------------------+--------------------+-------------------+------------------+----------------+----------------+---------------+-------------+-------------+------------+----------+---------------+--------------+------------+---------------------+----------------
SIARM_CAXIAS | dbo | MV_IPTU_Contribuintes_Cadastro | Inscricao | 1 | | NO | int | | | 10 | 10 | 0 | | | | | | | | | | | | | | | | | | | | Y\x02 Note the the \x02 caracter reported at last column named dtd_identifier, what I called ASCII(2). |
Found 2403 error code defined at freetds as an constant called SYBEICONVI |
As far as I can tell, the column https://msdn.microsoft.com/en-us/library/ms188348.aspx This probably means that this column is being populated with whatever random bytes are already in memory when tds_fdw is fetching results. This looks like a new bug, so I've submitted that here: Either way, this seems completely unrelated to any character set issues you are experiencing. It still sounds to me like you may have data in one of your rows that is incompatible with UTF-8. Maybe try to find out which column is failing? (e.g. maybe try fetching only one varchar column at a time?) Also, what version of MS SQL Server are you using? If you are using 2008 or later, maybe try |
I don't know if this warning should be treated in a non-fatal manner without knowing exactly what is going wrong here for you. Can you please try to find out what data is failing? You should be able to find out which
Then you could figure out which column is failing conversion with:
|
I followed yours sugestions and found that the failing column is on second query SELECT * FROM cliente."MV_IPTU_Contribuintes_CadastroHistoricos_Historico"; beyond 10000 rows fetched. |
Digging into MSSQL information_schema.columns table, I found that the type used on column The following attempt to translate the character to a binary reprentation failed with an error reported on footer CREATE FOREIGN TABLE cliente."MV_IPTU_Contribuintes_CadastroHistoricos_Historico"
(
"Historico" character varying
)
SERVER siarm
OPTIONS (query 'Select CAST( CAST( Historico AS varchar(max) ) AS varbinary(max) ) AS Historico FROM MV_IPTU_Contribuintes_CadastroHistoricos' );
If I understood the documentation, when text type is selected, the database stores data on the configured database codepage, that could be an single byte not UCS-2(unicode). |
You might want to try downloading the most recent commit of tds_fdw and then compiling tds_fdw with You might also want to consider enabling a FreeTDS log file. |
Just for reference I record the server version used:
|
Currently trying read of column "Historico"(history) as binary CREATE FOREIGN TABLE cliente."MV_IPTU_Contribuintes_CadastroHistoricos_HistoricoHex"
("Historico" text NOT NULL)
SERVER siarm
OPTIONS (query 'SELECT convert(varbinary(8000), convert(varchar(8000),Historico), 0) AS Historico FROM MV_IPTU_Contribuintes_CadastroHistoricos'); A simple select on postgresql showed that data has arrived. select
char_length( "Historico" ), octet_length( "Historico" ), "Historico", encode( "Historico"::bytea, 'hex' )
--, convert_to( "Historico"::bytea, 'iso_8859_1') mssql_varchar_from_varbinary
from
cliente."MV_IPTU_Contribuintes_CadastroHistoricos_HistoricoHex"
limit 100 |
I had the same problem. What happened: I'm configured my If I fetch some integer columns,freetds works well but, if I fetch some text columns, the error 2403 happens. To find a solution, I've tried different charsets and, finally, the Follow my
It seems that the sql server has data registered with "CP1252" format, so iconv returns an error because the command can't understand the some charset definitions in database. Using "CP1252" the conversion was done correctly and without errors. Setup info |
Currently, tds_fdw can't set the character set for the connection.
The FreeTDS implementation of DB-Library does have the DBSETLCHARSET macro to set the character set:
http://www.freetds.org/reference/a00284.html#ga114
However, it says it doesn't work on TDS 7.0+ connections. This makes it less useful for most versions of MS SQL Server, since those try to use 7.0+ by default:
http://www.freetds.org/userguide/choosingtdsprotocol.htm
The reason it is not supported for 7.0+ is described here:
http://www.freetds.org/userguide/localization.htm
"It is also worth clarifying that TDS 7.0 and above do not accept any specified character set during login, as 4.2 does. A TDS 7.0 login packet uses UCS-2."
There's also the dbsetdefcharset function:
http://www.freetds.org/reference/a00284.html#ga86
But it looks like that might be unimplemented in the current versions of FreeTDS.
Character sets can be set in freetds.conf:
http://www.freetds.org/userguide/freetdsconf.htm
It is probably possible to change character sets with DB-Library in a way that is compatible with versions 7.0+ of TDS also. I should figure out how.
On the PostgreSQL side, I also wonder how BuildTupleFromCStrings will handle character sets with multi-byte characters. I might need to find another way to build tuples.
The text was updated successfully, but these errors were encountered: