Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use the columnar store extension with timescales #89

Closed
tr8dr opened this issue Jun 9, 2017 · 5 comments
Closed

Can I use the columnar store extension with timescales #89

tr8dr opened this issue Jun 9, 2017 · 5 comments
Labels

Comments

@tr8dr
Copy link

tr8dr commented Jun 9, 2017

I have "wide" tables, where for some analytical queries would greatly benefit from columnar storage & other efficiencies. Can timescaledb work with the columnar extension? For example:

https://github.com/citusdata/cstore_fdw

@stalltron
Copy link

The answer at this point in time is no. The data partitioning (chunking) TimescaleDB uses is optimized for indexing the data so that queries, especially as they increase in complexity, are performant across larger volumes of data. With a columnar store you lose almost-all indexing (i.e., there is no B-tree support at all) so it doesn't make sense to combine the two models given our decisions. We've had some internal engineering discussions about some ideas for columnar storage, but it is not on any shorter-term roadmap.

@akulkarni
Copy link
Member

Also - If you feel comfortable sharing the general structure of the data you are storing (and the relevant queries), we can also take a closer look / make suggestions on how we'd recommend to best store that data in Timescale.

@tr8dr
Copy link
Author

tr8dr commented Jun 10, 2017

I recognize that columnar storage is poor for certain workloads and better for others. My main issue is the cost of table scans when a given column-narrow ad-hoc query cannot be resolved by an index.

I suspect the biggest win for my sort of queries would be if could apply parallel disk read + filtering (in this case for 1 server with a 10-way disk array and may cores). This would be, without the hardware, similar to what Netezza does, i.e. parallel reads with filtration based on what part of a query can be run on a chunk of data, on each tightly coupled cpu <-> disk.

At the moment, short of creating numerous indices across many rows, some queries will involve a linear table scan. Linear scan can work reasonably well with parallelism.

@mfreed
Copy link
Member

mfreed commented Jul 6, 2017

Hi @tr8dr, sorry for the delay in responding.

One of the lesser-advertised features in our recent 0.1.0 release is the ability to associate multiple Postgres tablespaces with a single hypertable, so that this single "table" can reside across multiple disks, and chunks belonging to this hypertable can be then queried in parallel.

Better documentation is forthcoming for the new attach_tablespace() API command, but if you are interested in the details:

71c5e78

@mfreed
Copy link
Member

mfreed commented Aug 15, 2017

@tr8dr I'm going to close out this issue unless there's anything else?

@mfreed mfreed closed this as completed Sep 20, 2017
syvb pushed a commit to syvb/timescaledb that referenced this issue Sep 8, 2022
89: Adding compound aggregate for uddsketch r=WireBaron a=WireBaron

This change also fixes some errors in the udd sketch combining code.

Co-authored-by: Brian Rowe <brian@timescale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants