Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift catalog extract #23

Closed
dhuettenmoser opened this issue Jun 16, 2021 · 12 comments
Closed

Redshift catalog extract #23

dhuettenmoser opened this issue Jun 16, 2021 · 12 comments
Assignees

Comments

@dhuettenmoser
Copy link

dhuettenmoser commented Jun 16, 2021

WHERE TABLE_SCHEMA NOT IN ('information_schema', 'pg_catalog')

in dbcat.catalog.db line 175 should be: WHERE SCHEMA NOT IN ('information_schema', 'pg_catalog')
this supplemental where clause is added after the extractor renames the table_schema columns to schema

@siva-mudiyanur
Copy link

I guess 'table_schema' in where clause seems to be fine as its going to be used to query against catalog tables, column 'schema' is being returned by RedshiftMetadataExtractor

@vrajat
Copy link
Member

vrajat commented Jun 16, 2021

Good catch. Thanks for pointing this out. I am not sure how I missed this when I tested with Redshift. This is also true for postgres and mysql. However postgres and mysql do not have unions or joins in their SQL queries w.r.t to schema and the extractors worked silently.

@dhuettenmoser or @siva-mudiyanur , do you have access to a redshift database that you can test fix with?

@vrajat
Copy link
Member

vrajat commented Jun 16, 2021

I just tested using "schema" for postgres and mysql. The change does not work because postgres does not support alias in where clause.

@vrajat vrajat self-assigned this Jun 16, 2021
@dhuettenmoser
Copy link
Author

I've only tested against redshift. I modified the local library file in order to get the redshift extraction to work, and changed that line to WHERE SCHEMA NOT IN ('information_schema', 'pg_catalog')
Prior to this I was getting "column "table_schema" does not exist" errors, after making the adjustment the extractor worked as expected. Thanks for the quick response!

@vrajat
Copy link
Member

vrajat commented Jun 16, 2021

Got it. Will you be able to test a patch version? I can release 0.5.4 with this fix. I will not be able to test redshift support for a few more days.

@dhuettenmoser
Copy link
Author

yes, I can test that out.

vrajat added a commit to vrajat/dbcat that referenced this issue Jun 17, 2021
vrajat added a commit that referenced this issue Jun 17, 2021
@vrajat
Copy link
Member

vrajat commented Jun 17, 2021

I've released v0.5.3. Can you check if that fixes this bug?

@dhuettenmoser
Copy link
Author

Thanks for making that adjustment! I'll test it out in the morning to verify.

@siva-mudiyanur
Copy link

I may not get a chance to test soon but I would definitely make sure to test by end of this week

@dhuettenmoser
Copy link
Author

This fix works in my case. Thanks!

@vrajat
Copy link
Member

vrajat commented Jun 18, 2021

Thanks. I'll close this ticket. @siva-mudiyanur if you can confirm, I'll be grateful.

@vrajat vrajat closed this as completed Jun 18, 2021
@siva-mudiyanur
Copy link

Hi @vrajat ..I've been getting errors when I try to update the package since the issue is fixed, am I missing something or package versions needs to be loosened up in the code?

INFO: pip is looking at multiple versions of dbcat to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install amundsen-databuilder[athena,bigquery,glue,rds,snowflake]==5.2.0 and amundsen-databuilder[athena,bigquery,glue,rds,snowflake]==5.2.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    amundsen-databuilder[athena,bigquery,glue,rds,snowflake] 5.2.2 depends on mysqlclient<3 and >=1.3.6; extra == "rds"
    amundsen-databuilder[athena,bigquery,glue,rds,snowflake] 5.2.0 depends on mysqlclient<3 and >=1.3.6; extra == "rds"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants