Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive JDBC support #54

Closed
jayeshagwan1 opened this issue Jan 27, 2020 · 31 comments
Closed

Hive JDBC support #54

jayeshagwan1 opened this issue Jan 27, 2020 · 31 comments

Comments

@jayeshagwan1
Copy link

Does piicatcher support Hive ?

@vrajat
Copy link
Member

vrajat commented Jan 27, 2020 via email

@jayeshagwan1
Copy link
Author

I am interested in contributing.

@vrajat
Copy link
Member

vrajat commented Jan 28, 2020

Thanks!
Here are some guidelines on how to get started:

Install a developer version of piicatcher

  1. Fork the repo.
  2. Instructions are here: https://tokern.io/docs/piicatcher/development

Hive installation

I am not sure about your tech setup. A web search should provide a lot of websites with instrutions to setup Hive.

Load data into Hive

I use a couple of simple datasets:

  1. https://github.com/tokern/piicatcher/blob/master/tests/test_databases.py#L19
  2. https://github.com/tokern/piicatcher/blob/master/tests/samples/sample-data.csv

Add pyhive

Add pyhive as a requirement in requirements.txt

Rerun pipenv update to install pyhive.

Write a explorer

An explorer is the base class for supporting different types of technologies.
You can use AWS Explorer as an example.

You'll have to:

  1. Create a new python file - hive.py - for example.
  2. Implement a cli function.
  3. Implement a HiveExplorer class similar to AthenaExplorer
  4. Change all the code in the functions to make it work with hive. For example all the queries have to be changed. Use pyhive instead of pyathena and so on.

I can answer any questions while you develop.

@jayeshagwan1
Copy link
Author

Thanks @vrajat. Will follow the above steps. If any issue, will let you know.

@jayeshagwan1
Copy link
Author

Above steps were followed. After running the command piicatcher --config hiveconfig.ini hive
getting below error :
image

It seems its issue on windows system while installing pyhive.

@vrajat
Copy link
Member

vrajat commented Jan 28, 2020 via email

@jayeshagwan1
Copy link
Author

image

@vrajat
Copy link
Member

vrajat commented Jan 29, 2020 via email

@jayeshagwan1
Copy link
Author

Thanks. Now facing :

image

@vrajat
Copy link
Member

vrajat commented Jan 29, 2020 via email

@jayeshagwan1
Copy link
Author

Hive2

@vrajat
Copy link
Member

vrajat commented Jan 29, 2020 via email

@jayeshagwan1
Copy link
Author

yes

@vrajat
Copy link
Member

vrajat commented Jan 29, 2020 via email

@jayeshagwan1
Copy link
Author

Now able to connect to hiveserver2. But getting below error:

raise ValueError("Password should be set if and only if in LDAP or CUSTOM mode; " ValueError: Password should be set if and only if in LDAP or CUSTOM mode; Remove password or use one of th ose modes

Currently I am passing auth='NOSASL' in connection. If I pass auth='Custom or none' then getting this error:

image

@vrajat
Copy link
Member

vrajat commented Feb 4, 2020

Can you confirm if these are errors when you try to connect to hive through python console ? No PIICatcher involved ?

Can you confirm if you can connect to Hive and run queries from python console ?

@jayeshagwan1
Copy link
Author

Sure. Will confirm. I think there similar open issues with pyhive also. Do we have other option for pyhive ?

@vrajat
Copy link
Member

vrajat commented Feb 4, 2020

@jayeshagwan1
Copy link
Author

There is some issue with pyhive. I have tried with python, but still getting same error.

image

@jayeshagwan1
Copy link
Author

Is it specific to OS ? Haven't tried with linux or ubuntu yet.

@vrajat
Copy link
Member

vrajat commented Feb 4, 2020

I am not sure. I've used in Centos and it worked. That was for a specific configuration of hive. OS or the configuration of python/hive can be the problem. Dont know how to help remotely with no knowledge about the setup.

@vrajat
Copy link
Member

vrajat commented Feb 4, 2020

Can you try impyla ?

@jayeshagwan1
Copy link
Author

Is this uses impala ?

@jayeshagwan1
Copy link
Author

I am trying on centOS, but getting this error:

[Errno 14] problem making ssl connection
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: bintray--sbt-rpm. Please verify its path and try again

So could not install anything. Tried couple of things for ssl but its not working

@vrajat
Copy link
Member

vrajat commented Feb 5, 2020 via email

@vrajat
Copy link
Member

vrajat commented Feb 5, 2020

ftw superset uses pyhive. https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/hive.py#L71

There are also hive related issues but in general it works. I still think there is something about your installation that pyhive does not work with.

@jayeshagwan1
Copy link
Author

I will start working on Hive from next week and keep you posted.

@zer0pool
Copy link
Contributor

zer0pool commented Jul 1, 2020

@jayeshagwan1 hello. I am wondering how this implementation go. it would be great if this feature can be added soon.

@vrajat
Copy link
Member

vrajat commented Jul 9, 2020

There hasnt been any progress on this feature. IIRC @jayeshagwan1 got stuck in installing a test Hive cluster. @zer0pool will you be able to help out?

@vrajat
Copy link
Member

vrajat commented Nov 9, 2021

closing this as there is not much demand for Hive. There is more interest in redshift, snowflake and Trino.

@vrajat vrajat closed this as completed Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants