-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I use the columnar store extension with timescales #89
Comments
The answer at this point in time is no. The data partitioning (chunking) TimescaleDB uses is optimized for indexing the data so that queries, especially as they increase in complexity, are performant across larger volumes of data. With a columnar store you lose almost-all indexing (i.e., there is no B-tree support at all) so it doesn't make sense to combine the two models given our decisions. We've had some internal engineering discussions about some ideas for columnar storage, but it is not on any shorter-term roadmap. |
Also - If you feel comfortable sharing the general structure of the data you are storing (and the relevant queries), we can also take a closer look / make suggestions on how we'd recommend to best store that data in Timescale. |
I recognize that columnar storage is poor for certain workloads and better for others. My main issue is the cost of table scans when a given column-narrow ad-hoc query cannot be resolved by an index. I suspect the biggest win for my sort of queries would be if could apply parallel disk read + filtering (in this case for 1 server with a 10-way disk array and may cores). This would be, without the hardware, similar to what Netezza does, i.e. parallel reads with filtration based on what part of a query can be run on a chunk of data, on each tightly coupled cpu <-> disk. At the moment, short of creating numerous indices across many rows, some queries will involve a linear table scan. Linear scan can work reasonably well with parallelism. |
Hi @tr8dr, sorry for the delay in responding. One of the lesser-advertised features in our recent 0.1.0 release is the ability to associate multiple Postgres tablespaces with a single hypertable, so that this single "table" can reside across multiple disks, and chunks belonging to this hypertable can be then queried in parallel. Better documentation is forthcoming for the new |
@tr8dr I'm going to close out this issue unless there's anything else? |
89: Adding compound aggregate for uddsketch r=WireBaron a=WireBaron This change also fixes some errors in the udd sketch combining code. Co-authored-by: Brian Rowe <brian@timescale.com>
I have "wide" tables, where for some analytical queries would greatly benefit from columnar storage & other efficiencies. Can timescaledb work with the columnar extension? For example:
https://github.com/citusdata/cstore_fdw
The text was updated successfully, but these errors were encountered: