New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISTINCT doesn't work for 2.0.0-alpha1 #99

Open
Loriowar opened this Issue Sep 25, 2016 · 15 comments

Comments

Projects
None yet
4 participants
@Loriowar
Copy link

Loriowar commented Sep 25, 2016

Query SELECT DISTINCT "column_name" FROM foreign_table; doesn't work. This return duplicate entries in case of usage tds-fdw from branch 2.0.0-alpha1. Branch 1.0.7 works as expected for the same query and the same environment.

My env:

  • Ubuntu 14.04
  • PostgreSQL 9.5
  • MSSQL 2012
  • freetds-common 0.91-5
@GeoffMontee

This comment has been minimized.

Copy link
Collaborator

GeoffMontee commented Sep 26, 2016

The 2.0.0-alpha1 release is a few commits behind master right now, so it is missing some fixes. Does master have this problem for you? It seems to work as expected for me:

postgres=# select * from mssql_table;
 id | col
----+-----
  1 |   1
  2 |   1
  3 |   1
(3 rows)

postgres=# select distinct col from mssql_table;
WARNING:  Table definition mismatch: Could not match local column id with column from foreign table
 col
-----
   1
(1 row)
@Loriowar

This comment has been minimized.

Copy link
Author

Loriowar commented Sep 29, 2016

Same problem in master branch of tds_fdw (commit 3a803c5) with latest FreeTDS version (1.00).

I have foreign table:

CREATE FOREIGN TABLE wrc_ctrs (
    "WMSLOCATIONID" varchar,
    ...)
SERVER mssql_server
OPTIONS (
    query 'SELECT [WRKCTRTABLE].[WMSLOCATIONID],
                  ...
           FROM   [WRKCTRTABLE]',
    row_estimate_method 'execute');

And quite simple query:

SELECT DISTINCT "WMSLOCATIONID" AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';

As result I receive:

wmslocationid 
---------------
 Механ
 Новый цех
 Механ
 Новый цех
 Механ
 Гравировка
 Механ
 Гравировка
 ...

(126 rows)

Here is lots of duplications. But, resulted amount of rows is less than simple count on same query.

Moreover, for COUNT with DISTINCT all works as expected:

SELECT COUNT(DISTINCT "WMSLOCATIONID") AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';

wmslocationid 
---------------
            18

Total count is much more:

SELECT COUNT("WMSLOCATIONID") AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';

wmslocationid 
---------------
          2755

For branch 1.0.7 all above queries works correct.

Is this a "magic" or somehow related with locale? May you help me in this situation or problem can be somewhere else? I can provide logs and any other information about env if needed.

In all described cased env is absolutely identical (same PG and MSSQL, same tables, same data, same initialize options and configs of the FDW and so on). Differences only in tds_fdw and FreeTDS versions.

@GeoffMontee

This comment has been minimized.

Copy link
Collaborator

GeoffMontee commented Nov 2, 2016

Hi @Loriowar,

I'm not entirely sure what is causing this problem. WHERE and column pushdowns aren't even performed when using the query foreign table option.

Would you be willing to provide a data set and table definitions that can reproduce this problem? That would help a lot.

@Loriowar

This comment has been minimized.

Copy link
Author

Loriowar commented Dec 28, 2016

Sorry for a huge delay. I can't provide a full data set, only a table definition:

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[WRKCTRTABLE](
  [WRKCTRGROUPID] [nvarchar](10) NOT NULL,
  [WRKCTRID] [nvarchar](10) NOT NULL,
  [NAME] [nvarchar](140) NOT NULL,
  [WRKCTRTYPE] [int] NOT NULL,
  [WRKCTRNUMOF] [numeric](28, 12) NOT NULL,
  [EFFECTIVITYPCT] [numeric](28, 12) NOT NULL,
  [OPERATIONSCHEDPCT] [numeric](28, 12) NOT NULL,
  [CAPACITY] [numeric](28, 12) NOT NULL,
  [CAPUNIT] [int] NOT NULL,
  [VENDID] [nvarchar](20) NOT NULL,
  [ACCOUNTWRKCTRISSUE] [nvarchar](20) NOT NULL,
  [ACCOUNTWIPVALUATION] [nvarchar](20) NOT NULL,
  [CREATED] [datetime] NOT NULL,
  [QUEUETIMEBEFORE] [numeric](28, 12) NOT NULL,
  [SETUPTIME] [numeric](28, 12) NOT NULL,
  [PROCESSTIME] [numeric](28, 12) NOT NULL,
  [PROCESSPERQTY] [numeric](28, 12) NOT NULL,
  [TRANSPTIME] [numeric](28, 12) NOT NULL,
  [QUEUETIMEAFTER] [numeric](28, 12) NOT NULL,
  [TRANSFERBATCH] [numeric](28, 12) NOT NULL,
  [TOHOURS] [numeric](28, 12) NOT NULL,
  [ERRORPCT] [numeric](28, 12) NOT NULL,
  [SETUPCATEGORYID] [nvarchar](10) NOT NULL,
  [PROCESSCATEGORYID] [nvarchar](10) NOT NULL,
  [DIMENSION] [nvarchar](10) NOT NULL,
  [DIMENSION2_] [nvarchar](10) NOT NULL,
  [DIMENSION3_] [nvarchar](10) NOT NULL,
  [ISGROUP] [int] NOT NULL,
  [ACCOUNTWIPISSUE] [nvarchar](20) NOT NULL,
  [PRODUNITID] [nvarchar](10) NOT NULL,
  [SITEID] [nvarchar](10) NOT NULL,
  [BOTTLENECKRESOURCE] [int] NOT NULL,
  [WMSLOCATIONID] [nvarchar](10) NOT NULL,
  [EMPLID] [nvarchar](20) NOT NULL,
  [CALENDARID] [nvarchar](10) NOT NULL,
  [CAPLIMITED] [int] NOT NULL,
  [INVENTLOCATIONID] [nvarchar](10) NOT NULL,
  [CAPACITYBATCH] [numeric](28, 12) NOT NULL,
  [EXCLUSIVE] [int] NOT NULL,
  [QTYCATEGORYID] [nvarchar](10) NOT NULL,
  [ROUTEGROUPID] [nvarchar](10) NOT NULL,
  [ACCOUNTWRKCTRISSUEOFFSET] [nvarchar](20) NOT NULL,
  [DATAAREAID] [nvarchar](4) NOT NULL,
  [RECVERSION] [int] NOT NULL,
  [RECID] [bigint] NOT NULL,
  [TASKGROUPID] [nvarchar](10) NOT NULL,
  [WRKCTRTASKDEMAND] [numeric](28, 12) NOT NULL,
  [PROPERTYLIMITED] [int] NOT NULL,
 CONSTRAINT [I_266WRKCTRIDX] PRIMARY KEY NONCLUSTERED 
(
  [DATAAREAID] ASC,
  [WRKCTRID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [WRKCTRGROUPID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [WRKCTRID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [NAME]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [WRKCTRTYPE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [WRKCTRNUMOF]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [EFFECTIVITYPCT]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [OPERATIONSCHEDPCT]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [CAPACITY]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [CAPUNIT]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [VENDID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ACCOUNTWRKCTRISSUE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ACCOUNTWIPVALUATION]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('1900-01-01 00:00:00.000') FOR [CREATED]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [QUEUETIMEBEFORE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [SETUPTIME]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [PROCESSTIME]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [PROCESSPERQTY]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [TRANSPTIME]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [QUEUETIMEAFTER]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [TRANSFERBATCH]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [TOHOURS]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [ERRORPCT]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [SETUPCATEGORYID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [PROCESSCATEGORYID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [DIMENSION]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [DIMENSION2_]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [DIMENSION3_]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [ISGROUP]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ACCOUNTWIPISSUE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [PRODUNITID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [SITEID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [BOTTLENECKRESOURCE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [WMSLOCATIONID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [EMPLID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [CALENDARID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [CAPLIMITED]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [INVENTLOCATIONID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [CAPACITYBATCH]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [EXCLUSIVE]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [QTYCATEGORYID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ROUTEGROUPID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [ACCOUNTWRKCTRISSUEOFFSET]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('dat') FOR [DATAAREAID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((1)) FOR [RECVERSION]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ('') FOR [TASKGROUPID]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [WRKCTRTASKDEMAND]
GO

ALTER TABLE [dbo].[WRKCTRTABLE] ADD  DEFAULT ((0)) FOR [PROPERTYLIMITED]
GO

ALTER TABLE [dbo].[WRKCTRTABLE]  WITH CHECK ADD CHECK  (([RECID]<>(0)))
GO
@Smtgr14

This comment has been minimized.

Copy link

Smtgr14 commented Dec 29, 2016

Do you get a correct result if you rewrite the query in this way?

SELECT DISTINCT ON ("WMSLOCATIONID") COUNT(*) AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';
@Loriowar

This comment has been minimized.

Copy link
Author

Loriowar commented Jan 25, 2017

This query still return duplications

SELECT DISTINCT ON ("WMSLOCATIONID") "WMSLOCATIONID" AS wmslocationid
FROM  wrc_ctrs
WHERE "WMSLOCATIONID" != '';

Your variant has incorrect syntax for PG. I don't know how can work DISTINCT ON and COUNT in this query.

@Smtgr14

This comment has been minimized.

Copy link

Smtgr14 commented Jan 26, 2017

Apologies, I forgot the GROUP BY "WMSLOCATIONID" at the end.
Should be:

SELECT DISTINCT ON ("WMSLOCATIONID") "WMSLOCATIONID" AS wmslocationid
FROM wrc_ctrs
WHERE "WMSLOCATIONID" != ''
GROUP BY "WMSLOCATIONID";
@Loriowar

This comment has been minimized.

Copy link
Author

Loriowar commented Jan 26, 2017

This query doesn't work. It return duplications too.

@Smtgr14

This comment has been minimized.

Copy link

Smtgr14 commented Jan 27, 2017

Since pushdowns are not working, try this query:

SELECT DISTINCT ON ("wmslocationid") FROM
  (SELECT "WMSLOCATIONID" AS wmslocationid
  FROM wrc_ctrs
  WHERE "WMSLOCATIONID" != '') a
GROUP BY "WMSLOCATIONID";

@Loriowar

This comment has been minimized.

Copy link
Author

Loriowar commented Jan 27, 2017

Little changes due to postgres syntax:

SELECT DISTINCT ON ("wmslocationid") "wmslocationid" FROM
   (SELECT "WMSLOCATIONID" AS wmslocationid
    FROM wrc_ctrs
    WHERE "WMSLOCATIONID" != '') a
GROUP BY "wmslocationid";

And I see a duplications in the result again.

@Loriowar

This comment has been minimized.

Copy link
Author

Loriowar commented Jan 27, 2017

An important question: must I update tds_fdw to the latest master commit? For now, all queries runs on tds_fdw compiled in the moment of commit 3a803c5.

@Smtgr14

This comment has been minimized.

Copy link

Smtgr14 commented Jan 27, 2017

When wrapping the query in a from clause to perform the SELECT DISTINCT, the handling of DISTINCT is no longer on tds_fdw or MSSQL, it's pure PostgreSQL at this point. if there are duplicates, check to make sure there isn't something about them that you can't see that makes them distinct, like spaces of other invisible characters. A last stitch effort is to use the query to fill a temporary table and run SELECT DISTINCT against it. If there are duplicates it's not tds_fdw, it's definitely something else. Try creating the table locally, fill it using a query for the raw data from mssql and then query this new table with distinct.

@Loriowar

This comment has been minimized.

Copy link
Author

Loriowar commented Jan 27, 2017

For now, I use cascade of materialized views. The first view is identical to the foreign table (wrc_ctrs) and it store data one to one without any modification. Second view store only distinct records from the first view. In this case all works perfect and there is no duplications. Other variant: I can install stable version of tds_fdw (version 1.0.7) and problem with distinct disappear. I can check your last variants of queries on tds_fdw with versions 1.0.8 and master. May this can help?

In a first time I thought about whitespace or something else in the column, but there is nothing. Other question: may this appear due to different encoding between databases or due to russian characters in column value? If so, how I can check this? And finally, is this a problem of tds_fdw or this is a problem of my local installation?

@Smtgr14

This comment has been minimized.

Copy link

Smtgr14 commented Jan 27, 2017

The russian characters might be the cause. @GeoffMontee would be able to explain better. I'm no good at encoding logic.

@smithje

This comment has been minimized.

Copy link

smithje commented Mar 2, 2017

I'm having this same issue with the current head. In my case, it seems related to case-sensitivity.

I am using tds_version 7.4, freetds 1.00.26, SQL Server 2008 R2, and postgresql 9.3. The SQL Server database is using collation SQL_Latin1_General_CP1_CI_AS. The postgres database is using en_us.UTF-8 for LC_COLLATE and LC_CTYPE. The encoding is UTF8. I don't believe any of these settings are particularly exotic.

 CREATE FOREIGN TABLE test_tds_fdw (
    UserID int,
    FirstName text,
    LastName text,
    UserName text,
    Organization text,
    Email text,
    Type int,
    Password text
)
SERVER test_server
OPTIONS (
    table 'Users',
    match_column_names 'true'
);

=> select distinct organization from test_tds_fdw where organization = 'self';
 organization 
--------------
 Self
 self
 Self
 self
 Self
(5 rows)

=> select distinct organization from 
    (select distinct organization from test_tds_fdw where organization = 'self') tds_sub;
 organization 
--------------
 self
 Self
(2 rows)

=> select distinct organization from test_tds_fdw where organization ilike 'self';
ERROR:  HV00L: DB-Library error: DB #: 20018, DB Msg: General SQL Server error: Check messages from the SQL Server, OS #: -1, OS Msg: , Level: 15
LOCATION:  tds_err_handler, tds_fdw.c:3856

Please let me know if there's any other information I can provide. Unfortunately, I only have read access to the SQL Server, so there's a limited amount I can do on that end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment