Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ClickHouse and other databases #230

Open
alexey-milovidov opened this issue Sep 19, 2023 · 16 comments
Open

Support ClickHouse and other databases #230

alexey-milovidov opened this issue Sep 19, 2023 · 16 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@alexey-milovidov
Copy link

I want to use this IDE with ClickHouse (which supports similar scenarios but is more powerful than DuckDB).

@tconbeer
Copy link
Owner

I like where you're going, but I'm not planning to support this at the moment. We take advantage of duckDB's in-process architecture and python APIs quite a bit, so it's not totally straightforward to plug in a different database (which would require connecting over http, etc.).

You're welcome to fork the project, or if you're willing to wait, probably some day around Harlequin v2 or v3 I'll fork this and provide a general-purpose SQL client that can connect to clickhouse, postgres, etc.

@alexey-milovidov
Copy link
Author

There is an in-process Python module of ClickHouse, named chdb: https://github.com/chdb-io/chdb

@tconbeer
Copy link
Owner

ok, interesting. That's closer, although still missing most of the APIs we use. I also don't understand the persistence model, if there is one? i.e., I don't think you can connect to an existing clickhouse db with chdb? seems designed to mostly operate on external files

@alexey-milovidov
Copy link
Author

Both external and persistent tables are available. E.g., you can create a MergeTree table that will be stored locally. Or you can communicate with external ClickHouse using the remote table function. Or read/write local and remote files.

@auxten
Copy link

auxten commented Sep 20, 2023

I'm willing to assist in integrating harlequin with chdb.

@tconbeer
Copy link
Owner

That's awesome, thanks so much. Let me noodle on this a little. First step is probably me enumerating all of the duckdb integrations so we can think through alternatives.

@Mause
Copy link
Contributor

Mause commented Sep 20, 2023

Worth mentioning that @alexey-milovidov is the Clickhouse CTO & designer, so he can probably fix any issues that you have with integrating Clickhouse as well (missing features, bugs, etc)

@lmangani
Copy link

ok, interesting. That's closer, although still missing most of the APIs we use. I also don't understand the persistence model, if there is one? i.e., I don't think you can connect to an existing clickhouse db with chdb? seems designed to mostly operate on external files

Hello @tconbeer
chdb can connect to clickhouse-servers as well as directly access local or remote files with compatible formats. It supports session persistence using temporary or persistent local storage. The APIs are indeed are still quite basic, but this could be a great opportunity for us to find ways to extend and improve chdb feature coverage, so any input is greatly welcome!

@tconbeer
Copy link
Owner

Spent some time on this.

This wasn't in my original plan, but there isn't any reason Harlequin can't be the client to any database. The in-process part isn't very important.

I'll create an interface (an ABC or two) for a Harlequin database adapter, and refactor the DuckDB parts to be an implementation of that interface. The interface will include some optional features that may not be relevant or may not be possible for other databases, like loading extensions, exporting query results (copy to), providing autocompletions, etc. We'll distribute these adapters as separate plug-ins.

At start-up, Harlequin will discover which plugins are loaded, and then which features are available and adapt accordingly. Users will select their plugin and provide connection parameters at the command line. It would be nice to specify the connection in a config file (#206) instead (using profiles or similar).

I do want to be clear this feels like v2 to me, and that'll happen on my own timeline, with many of the smaller features shipping first. If y'all want to accelerate that, you can either start contributing to the project or you can pay me (I work as a contractor for my day job already). If you're interested in the latter, please email me (ted at shandy dot eye-oh) and we can schedule a call.

Finally, creating a delightful interface for DuckDB is still my first priority, so if I feel like this refactor puts that at risk for whatever reason, I reserve the right to change tacks and abandon this plan.

Thanks for your excitement around Harlequin -- I'm glad that it's pushed me to think about this as something bigger.

@tconbeer tconbeer changed the title ClickHouse support Support ClickHouse and other databases Sep 21, 2023
@tconbeer tconbeer added the enhancement New feature or request label Sep 21, 2023
@tconbeer
Copy link
Owner

tconbeer commented Nov 13, 2023

The pieces are there now, although undocumented. I'm going to implement an SQLite adapter to pressure-test the API and build out a guide for creating adapters; that should happen over the next week or two. If you want to get cracking sooner than that, you can work off the following examples:

  • src/harlequin/adapter.py provides three ABCs that you need to implement: HarlequinAdapter, HarlequinConnection, and HarlequinCursor
  • src/harlequin_duckdb provides the DuckDB implementation of HarlequinAdapter, etc. Note that it's discovered as a plug-in via the entry-point declaration in the root pyproject.toml file:
  • src/harlequin_sqlite provides the SQLite implementation.
[tool.poetry.plugins."harlequin.adapter"]
duckdb = "harlequin_duckdb:DuckDbAdapter"
sqlite = "harlequin_sqlite:HarlequinSqliteAdapter"

There isn't a great, generalizable test harness (yet). I'll likely be making big changes/improvements to the tests of the project to make it easier to test other adapters as I develop the SQLite adapter. You can reference the tests in tests/adapter_tests/test_duckdb.py; you could also swap out the fixtures in tests/conftest.py to use different adapter fixtures.

@tconbeer
Copy link
Owner

@auxten @alexey-milovidov @lmangani this is ready for you, now.

The guide for contributing an adapter can be found here: https://harlequin.sh/docs/contributing/adapter-guide
A template repo is here: https://github.com/tconbeer/harlequin-adapter-template
My postgres adapter can be found as a reference here: https://github.com/tconbeer/harlequin-postgres

The template repo includes a minimal set of tests.

The guide should include everything else you need to know, but feel free to ask any questions, or we can set up a quick call.

@lmangani
Copy link

Thanks @tconbeer this is amazing! I'll try this as soon as I get a time pocket.

@dchimeno
Copy link

is someone working on this? I could do it, but I don't want duplicate efforts :)

@lmangani
Copy link

@dchimeno I don't think anyone started - at least not from the chdb side - so feel free!

@tconbeer tconbeer added the help wanted Extra attention is needed label Jan 4, 2024
@ThomAub
Copy link

ThomAub commented May 4, 2024

I started to get a basic ClickHouse adapter working you can find it here harlequin-clickhouse

@tconbeer It was great to look at the harlequin-postgres and use the adapter-template !

This is currently using clickhouse-driver without connection pool, it doesn't really provide completions and there is a lot to improve.

@auxten i wanted to go chdb all the way and use the remote option but this would require parsing the sql query to replace the FROM table_name with FROM remote(host, ..., table_name).
I think i will create another adapter for chdb/clickhouse-local

@auxten
Copy link

auxten commented May 5, 2024

Great to hear this! @ThomAub
If any question encountered, feel free to ping me or raise an issue for chdb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants