Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add a feature ClickHouse database support #26
@@ Coverage Diff @@ ## master #26 +/- ## ========================================== - Coverage 61.91% 53.67% -8.25% ========================================== Files 60 66 +6 Lines 2610 3173 +563 ========================================== + Hits 1616 1703 +87 - Misses 984 1458 +474 - Partials 10 12 +2
Yes, sure, no problem at all
Understood, sorry, missed this idea. Hope to be more precise next time
First of all, thanks for putting it all this work.
I do have two concerns before I actually dig through the Clickhouse code:
- There are several refactorings that are not related to the Clickhouse code, which makes this PR a bigger job that it needs to be. Typically we prefer those changes to be separate PRs so that there isn't as much mixing of concerns. Especially since some of them seem to change Golang-style short names (1-2 letters even) for longer, descriptive names.
- Some of the formatting as changed in odd ways and some I am pretty sure are not
gofmtapproved. In particular, changing the format of the queries in TimescaleDB seems unnecessary and is harder to read (too much white space).
I haven't yet looked deeply at the Clickhouse code since I wanted to point these things out. Hopefully its not too much work to revert some of the more subjective changes and make sure it is run through gofmt.
Thanks for the review, and yes, I agree - there are some refactorings not directly related to the ClickHouse functionality. These refactorings originate from the code reversing process - when I've started with implementation, I had very vague understanding of the project, therefore each tool has to be reversed and understood. That's where small code-related refactorings are very handy. As Martin Fowler said: "Whenever I have to think to understand what the code is doing, I ask myself if I can refactor the code to make that understanding more immediately apparent." So through those small refactorings comes code understanding. As to extracting those small refactoring into separated PRs - yes, of course this can be done, but those refactorings are not the goal by themselves and as Martin says: "You refactor because you want to do something else, and refactoring helps you do that other thing."
As a next step, I'll review all the changes, especially subjective ones and revert those that does not makes code better or are not liked by formatter.
Great, thank you! With regards to refactorings, some of them are definitely improvements -- spelling fixes, adding code comments where they are sparse before, and some of the name changes (e.g.,
In general I agree with the gist of what you're saying and definitely want to make the code approachable for others to contribute so we can definitely look at those changes in isolation and see which ones make sense. Just want to keep the PRs more narrowly focused.
This looks mostly good to me, assuming this is the best/most efficient way to connect to Clickhouse.
Can you elaborate on whether that is the case, or if you were mostly trying to re-use TimescaleDB scaffolding here? I am curious because a cursory search finds a few Clickhouse-specific adapters for Go, so I am wondering what the benefits or drawbacks of using those are? For example:
other libs here: https://clickhouse.yandex/docs/en/interfaces/third-party_client_libraries/
Since it seems to support a variety of connection methods (HTTP, JDBC, etc), did you see if there were performance differences?
ClickHouse exposes two API endpoints:
Native binary API is less (should actually be read as poorly) documented and thus very few driver implementations are based on it. Go adapter used is one of those native-API implementations. Native protocol is considered to be more effective and faster than HTTP. HTTP - on the other hand - is simpler, well documented and generally more welcomed by application developers. JDBC driver implementation, for example, is based over HTTP protocol under the hood. So, general idea is that native protocol is expected to be faster, however, there are no any numbers available (at my knowledge). This may be one of the possible future test-cases - compare speed of native vs HTTP protocols on the same tasks.
So my thoughts were like:
HTTP protocol can be touched sometime later
Just two nits.
Thanks for the explanations of why you chose this. I see this code was used in a blog post (I believe, unless some one else did it as well?), so presumably the performance was satisfactory.
CTO of Altinity Alexander Zaytsev wrote blogpost with detailed comparison of databases available in the test
Each of databases has an area where it shines! Timescale and ClickHouse are in general faster than Influx, however Influx simply surprised with disk usage - it seems not to need any disk space at all!