Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep Scan in Athena failing with HIVE_UNKNOWN_ERROR #63

Closed
vrajat opened this issue Feb 13, 2020 · 13 comments
Closed

Deep Scan in Athena failing with HIVE_UNKNOWN_ERROR #63

vrajat opened this issue Feb 13, 2020 · 13 comments
Assignees
Labels
support Support Requests

Comments

@vrajat
Copy link
Member

vrajat commented Feb 13, 2020

Yeah.. here its is...
debug_log.txt

Originally posted by @jayeshagwan1 in #59 (comment)

@vrajat vrajat self-assigned this Feb 13, 2020
@vrajat vrajat added the support Support Requests label Feb 13, 2020
@vrajat
Copy link
Member Author

vrajat commented Feb 13, 2020

@jayeshagwan1 can you run select count(*) from temp.pii_data in Athena console ?

Also the debug log seems to be truncated. Can you make sure that it contains all of the output ?

@jayeshagwan1
Copy link

I am having complete log, I have just scrapped and provided you a a certain portion. Will provide the complete log file in some time.

@vrajat
Copy link
Member Author

vrajat commented Feb 13, 2020

I want to check these types of lines:

DEBUG:root:Count Query: select count(*) from temp.pii_data 
DEBUG:pyathena.common:select count(*) from temp.pii_data

@jayeshagwan1
Copy link

jayeshagwan1 commented Feb 13, 2020

Not sure, but will this work ?
logs.txt

@vrajat
Copy link
Member Author

vrajat commented Feb 13, 2020

I still see only select count(*) from temp.pii_data in the logs. I dont see any exceptions or errors. I want to know the last query that was run before the exception or error happened

@jayeshagwan1
Copy link

Sorry. Here is the log file if I dont hard code schema and table and just do select *
error_Log.txt

@jayeshagwan1
Copy link

Even after passing schema and table from config, its still looking into all tables. It should fetch only mentioned schema and table.

@vrajat
Copy link
Member Author

vrajat commented Feb 13, 2020

Can you test this query in athena console ? select * from sampledb.elb_logs TABLESAMPLE BERNOULLI(5) limit 10 ?

sampledb.elb_logs is created automatically and I think this is not setup properly or your account doesnt have access.

can you try piicatcher ... --exclude-schema sampledb?

I'll look into "Even after passing schema and table from config, its still looking into all tables. It should fetch only mentioned schema and table." separately

@jayeshagwan1
Copy link

jayeshagwan1 commented Feb 13, 2020

piicatcher ... --exclude-schema sampledb throwing error:
error.txt

@vrajat
Copy link
Member Author

vrajat commented Feb 13, 2020

I just released a new version. Mostly with debug logging improvements. I do not know if it will fix all your issues. Please give it a try without any edits. Also please provide any debug logs for any issues.

On filtering tables, check my comment in #65 on the right syntax.

@jayeshagwan1
Copy link

In new version its working fine.
Some of columns are not getting identified as required PII data type.

@vrajat
Copy link
Member Author

vrajat commented Feb 13, 2020 via email

@jayeshagwan1
Copy link

Will this work ? Deep scan and shallow scan results are different.
deepscan.txt
shallowscan.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Support Requests
Projects
None yet
Development

No branches or pull requests

2 participants