Design/status/roadmap on the clustered version? #9

blurrcat · 2017-03-29T02:44:01Z

It is mentioned in the FAQ that the clustered version is in active development. Can you share more details on the design, status, or roadmap? The technical paper doesn't seem to shed much light on this to me. Am I missing anything?

Anyway, great job TimescaleDB!

akulkarni · 2017-03-29T16:02:39Z

The technical paper describes both our single-node design as well as the clustered design. But if something is unclear, happy to answer any questions.

We haven't announced a timeline for the open-source clustered version yet, but it's on the order of months.

archenroot · 2017-10-11T23:20:24Z

@akulkarni Hi guys, I think you are quite busy, but do we have any news on scaling out timescaledb accross multiple nodes? In general the idea is if I can do something like with Cassandra, when the load is increasing, I just "plugin" into cluster new nodes, sync and go. If selected I plan to use the timescaledb in Mesos & Docker environment...

akulkarni · 2017-10-12T01:27:41Z

Hi @archenroot, Are you worried about ingest load or query load?

If ingest load, our single node instance regularly handles 100k inserts/sec (400k inserts/sec under certain configurations). (Which according to our benchmarks is equivalent to several Cassandra nodes.) But let me know if you need more than that.

If query load, then we support read-clustering using PG streaming replication. Then you would round-robin your queries across the various nodes to increase query throughput.

archenroot · 2017-10-12T07:27:28Z

@akulkarni
thanks for answer first of all.

So, lets imagine I am doing IoT project, where the ultimate sensor data (including images from cameras) will be in volume of unique records at ~60,000,000 per day (I count the target of course).
This represents data per:
1 year - 21,900,000,000
5 years - 109,500,000,000

Writes:
As you suggested timescaledb is capable of high writes (100k inserts/sec), which as number looks fine, but the question is also about hardware sizing. Let's look from service (backend-layer perspective). I will be deploying let's say initially 3 active-active instances, while the load of events comming from device will increase in time, I will simply deploy more containers (docker) if possible within Mesos (DCOS) infrastructure, so horizontal scalability is not a big deal. Based on monitoring load you can not only add instances, but also dynamically remove them to save money when the load is not high (Same apply for message broker network, imagine you are in AWS, etc.).

Writes Question:
But how to scale properly timescaledb. Is it even scalable in similar way as springboot based microservices are? Or it is better for me to directly buy some 4x socket powerfull servers which on the other hand won't be used fully from beginning of project.

Reads:
Good point here, in case of PGStreaming the slave instances are also timescaledb, right?

Multitenancy:
That is the way of how to design the data, I was working before on Oracle 11g Virtual Private Database framework to implement security not on application level (so application fetch all data which later filters), but purely on db level (the client id within the session is recognized by database and it will only return client's data. In Oracle you can do whatever you want, I mean it is not only "disabling data" on row or column, but on cell level. But it is on the other hand quite tricky and I understand this goes beyond timescalleDB, just pointing this out as this will be the way if possible in the design part. Is TimescaleDB compatible with any kind of tenant mechanism(either via integration?

But Multitenancy is out of topic question and there is nice discussion here:
https://stackoverflow.com/questions/44524364/postgresqls-schemas-for-multi-tenant-applications

In our case it more looks we have dozen of tenants which has relativelly small amount of data (gross numbers):

100,000 records per year
500,000 records per 5 years

Offtopic - can I combine timescaledb with CONTINUOUS VIEWS from pipeline DB?
Imagine I will create 1 database for all the writes based on timescaledb and 2nd database based on pipelineDB, it should be theoretically possible to create continuous views in pipelineDB where the source will be READ SLAVE nodes of the timescaledb, is it possible? Even suggestion here helps, I will test myself of course :-) This idea is mainly to provide real-time AGGREGATION updates during specific time period of time.

Last note, I expect none or minimum UPDATE statements in the system.

akulkarni · 2017-10-24T02:28:48Z

Hi @archenroot,

Apologies for the delay in responding. Reopening this issue so it doesn't get lost again.

Responses:

1. Storage capacity - Two things worth mentioning here: First, we support data retention policies that would allow you to aggregate data into lower granularities and age out (ie delete) the raw data. We also support moderate elastic scaling via network-attached disks.

2. Writes - We currently do not support horizontal scale out for writes. But according to my calculations, you are doing <1000 inserts/sec (60,000,000/day), so our single node performance should be more than enough. (If instead you are looking for clustering for redundancy, then we would suggest setting up a read slave for failover.)

3. Reads - Yes, the read/slave instances would have to be also TimescaleDB.

4. Multi-tenancy - We have not yet built any native functionality for multi-tenancy, outside of what may already be available in PostgreSQL. Currently our users who require this have been handling themselves at the application layer. There may however be multi-tenant add-on options for PostgreSQL, but we haven’t been able to research those yet.

5. PipelineDB - I’m not sure if you’re suggesting to make the read slave just PipelineDB, or TimescaleDB + PipelineDB. If the former, then I don’t know how it would handle time-series data - we have limited experience. If the latter, then it is possible it may work, but again needs to be tested. (But if you do try out yourself, please let us know how it goes.)

jrevault · 2017-11-10T10:25:19Z

Hi @akulkarni,
I would second @archenroot thoughts, being able to add seamlessly nodes would help in a case of important new data acquisition, in order to avoid migrating the server each time it would go beyond its capacity.
Today consequences for us is that we must 'pay' for a bigger server just because I cannot easily add new nodes.... if I pay only for the right dimensioned server, I know I'll have to migrate every 2 months (and I would really do something else than often migrating data 😅 ).. so we pay more 😞

And thanks for the wonderful work with Timescale 👍

archenroot · 2017-11-10T11:02:24Z

If you are aware of architecture like Mesos or DCOS, uzually you are running either storage or services and monitor how they are used, when you reach metrics treshold, you just deploy one or more instances and the system autoreconfigure cluster on the fly. Same happens when the load is falling down, you remove instances, as they are not needed anymore. That is the beauty of autoscalable systems.

mfreed · 2017-11-14T21:58:09Z

@jrevault One interesting aspect of even timescale's design today is that you can elastically add disks to a hypertable, so the storage capacity can scale with your storage needs. We've seen this applied particularly effectively with network-attached storage (e.g., EBS, Azure Storage) in the cloud.

@archenroot Yep, quite familiar with both of those cluster-management services. That said, scaling up/down a storage system like Timescale is quite a bit different than a stateless, shared-nothing service like your web front-end or API service, as membership changes necessitate application-level support to ensure proper consistency in the stateful storage tier. :)

jrevault · 2017-11-22T07:23:28Z

@archenroot and @mfreed (sorry for this late answer but we were going live this WE)

Our problem isn't adding disk space, it's a more a memory and CPU one ( well at least it's my understanding of it)
With users growth we will have to treat more and more data.
If each month during 2 years, we multiply by 5 the initial data acquisition rhythm, we need to regularly have more memory and CPU to handle it, meaning changing server size and adapting postgres conf (max_connections, shared_buffers, effective_cache_size, etc...) so I was thinking with horizontal scalability.
But maybe I don't take the problem with the good angle because at start we won't have many data to ingest.
To have an idea of quantity of data to ingest (it's not big), we store student's exercises answers.
In about 2 months we should ingest ~24 000 000 exercises per month meaning on users peak (mostly 4 hours in a day) ~40 inserts/sec (not really impressive) a small server can fit.
5 months later, we should have ~1 000 inserts/sec
5 months later, we should have ~2 000 inserts/sec
and so on...
until we reach a stabilization point not known yet.

aleksve · 2018-04-13T15:59:04Z

Thanks for interesting ideas! I am evaluating TimescaleDB for a fairly large data collection project. I expect a fairly moderate data inflow rate, maybe 20 000 entries per second. But we want to retain the historical data to the extend possible. 20 000 entires per year accumulate quickly, we expect about 100 Tb of data per year. @akulkarni mentioned something very relevant in the post of Oct 24, 2017: TimescaleDB supports network-allocated storage, which could of course be scaled up to several petabytes. Judging from the TimescaleDB API[1], storing some chunks on a NAS is doable. But is it possible to makeTimescaleDB differentiate between the tablespaces? Can TimescaleDB be configured to create new chunks on the local, RAID storage, and move them to NAS when the data gets older/colder?

Thanks!

[1] Hypertable management, attach_tablespace() https://docs.timescale.com/v0.9/api#hypertable-management

murugesan70 · 2018-05-09T03:46:19Z

Hi @akulkarni @mfreed
we at Plume Designs (Cloud managed wifi company) are evaluating timescaleDB for our time-series DB and I have the same feature request as others.

Horizontally scale timescaleDB for our ever growing cloud infrastructure with 200+TB timeseries DB. Apart from vertical scaling, we do want to horizontally scale on number of servers (apart from active-active cluster failover)
When do you actually plan to roll out horizontally scaling timescaleDB?

Thanks

archenroot · 2018-11-25T17:43:11Z

@murugesan70 - I am not sure, but looks like VoltDB can support partition horizontal split of tables. Take a look.

dianasaur323 · 2019-01-10T21:04:08Z

Hi everyone! I'm the PM heading up clustering. I'm looking for a couple people to help us provide feedback as we move forward with this. If you'd like to be considered as part of that (we can only take a couple people), please let me know! I'm available in our Slack community as Diana Hsieh, or you can ping me at diana at timescale.com.

dianasaur323 · 2019-07-16T15:45:21Z

Hello~~ Public facing documentation is out for our private beta of clustering (which is currently only in limited release). Please go ahead and take a look!

https://docs.timescale.com/clustering/introduction/architecture#timescaledb-clustering and
https://docs.timescale.com/clustering/getting-started/scaling-out

Whenever paths are added to a rel, that path or another path that previously was on the rel can be freed. Previously, the compressed rel's paths could be freed when it was re-planned by the postgres planner after being created and planned by us. The new path the postgres planner added was cheaper and overwrote and pfreed the old path which we created and saved as a child path of the decompress node. Thus we ended up with a dangling reference to a pfreed path. This solution prevents this bug by removing the path we create from the compressed rel. Thus, the chunk rel now "owns" the path. Note that this does not prevent the compressed rel from being replanned and thus some throw-away planner work is still happening. But that's a battle for another day. The backtrace for the core planner overwriting our path is: ``` frame timescale#4: 0x0000000105c4ed0f postgres`pfree(pointer=0x00007fe20d01a628) at mcxt.c:1035 * frame timescale#5: 0x000000010594c998 postgres`add_partial_path(parent_rel=0x00007fe20d01ae10, new_path=0x00007fe20f800298) at pathnode.c:844 frame timescale#6: 0x00000001058ede4b postgres`create_plain_partial_paths(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10) at allpaths.c:753 frame timescale#7: 0x00000001058edb93 postgres`set_plain_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rte=0x00007fe20d0198c0) at allpaths.c:727 frame timescale#8: 0x00000001058ed78b postgres`set_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rti=13, rte=0x00007fe20d0198c0) at allpaths.c:452 frame timescale#9: 0x00000001058e8e16 postgres`set_base_rel_pathlists(root=0x00007fe2113fc668) at allpaths.c:310 frame timescale#10: 0x00000001058e8b49 postgres`make_one_rel(root=0x00007fe2113fc668, joinlist=0x00007fe20d0121c8) at allpaths.c:180 frame timescale#11: 0x000000010591ee77 postgres`query_planner(root=0x00007fe2113fc668, tlist=0x00007fe2113fcb58, qp_callback=(postgres`standard_qp_callback at planner.c:3492), qp_extra=0x00007ffeea6ba2b8) at planmain.c:265 frame timescale#12: 0x00000001059229cb postgres`grouping_planner(root=0x00007fe2113fc668, inheritance_update=false, tuple_fraction=0) at planner.c:1942 frame timescale#13: 0x0000000105920546 postgres`subquery_planner(glob=0x00007fe218000328, parse=0x00007fe218000858, parent_root=0x0000000000000000, hasRecursion=false, tuple_fraction=0) at planner.c:966 frame timescale#14: 0x000000010591f1e7 postgres`standard_planner(parse=0x00007fe218000858, cursorOptions=256, boundParams=0x0000000000000000) at planner.c:405 frame timescale#15: 0x000000010642d9b4 timescaledb-1.5.0-dev.so`timescaledb_planner(parse=0x00007fe218000858, cursor_opts=256, bound_params=0x0000000000000000) at planner.c:152 ```

Whenever paths are added to a rel, that path or another path that previously was on the rel can be freed. Previously, the compressed rel's paths could be freed when it was re-planned by the postgres planner after being created and planned by us. The new path the postgres planner added was cheaper and overwrote and pfreed the old path which we created and saved as a child path of the decompress node. Thus we ended up with a dangling reference to a pfreed path. This solution prevents this bug by removing the path we create from the compressed rel. Thus, the chunk rel now "owns" the path. Note that this does not prevent the compressed rel from being replanned and thus some throw-away planner work is still happening. But that's a battle for another day. The backtrace for the core planner overwriting our path is: ``` frame #4: 0x0000000105c4ed0f postgres`pfree(pointer=0x00007fe20d01a628) at mcxt.c:1035 * frame #5: 0x000000010594c998 postgres`add_partial_path(parent_rel=0x00007fe20d01ae10, new_path=0x00007fe20f800298) at pathnode.c:844 frame #6: 0x00000001058ede4b postgres`create_plain_partial_paths(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10) at allpaths.c:753 frame #7: 0x00000001058edb93 postgres`set_plain_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rte=0x00007fe20d0198c0) at allpaths.c:727 frame #8: 0x00000001058ed78b postgres`set_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rti=13, rte=0x00007fe20d0198c0) at allpaths.c:452 frame #9: 0x00000001058e8e16 postgres`set_base_rel_pathlists(root=0x00007fe2113fc668) at allpaths.c:310 frame #10: 0x00000001058e8b49 postgres`make_one_rel(root=0x00007fe2113fc668, joinlist=0x00007fe20d0121c8) at allpaths.c:180 frame #11: 0x000000010591ee77 postgres`query_planner(root=0x00007fe2113fc668, tlist=0x00007fe2113fcb58, qp_callback=(postgres`standard_qp_callback at planner.c:3492), qp_extra=0x00007ffeea6ba2b8) at planmain.c:265 frame #12: 0x00000001059229cb postgres`grouping_planner(root=0x00007fe2113fc668, inheritance_update=false, tuple_fraction=0) at planner.c:1942 frame #13: 0x0000000105920546 postgres`subquery_planner(glob=0x00007fe218000328, parse=0x00007fe218000858, parent_root=0x0000000000000000, hasRecursion=false, tuple_fraction=0) at planner.c:966 frame #14: 0x000000010591f1e7 postgres`standard_planner(parse=0x00007fe218000858, cursorOptions=256, boundParams=0x0000000000000000) at planner.c:405 frame #15: 0x000000010642d9b4 timescaledb-1.5.0-dev.so`timescaledb_planner(parse=0x00007fe218000858, cursor_opts=256, bound_params=0x0000000000000000) at planner.c:152 ```

akulkarni closed this as completed Apr 1, 2017

This was referenced Oct 8, 2017

server crash during ingestion #244

Closed

server crash, stack overflow #245

Closed

akulkarni reopened this Oct 24, 2017

akulkarni added the question label Oct 24, 2017

fvannee mentioned this issue Jun 8, 2018

Non-deterministic segfault during cached plan creation on non-TimescaleDb query #557

Closed

Specter-Y mentioned this issue Apr 15, 2019

Segmentation fault,"Failed process was running: insert hypertable" #1165

Closed

OneMoreSec mentioned this issue May 9, 2019

Segmentation fault caused by vacuum and analyze #1219

Closed

dianasaur323 added this to the 2.0.0 milestone Jul 16, 2019

erimatnor added the multinode label Jan 31, 2020

erimatnor removed this from the 2.0.0 milestone Jan 31, 2020

bboule closed this as completed Feb 19, 2020

svenklemm mentioned this issue Jun 23, 2020

Segfault in create_trigger_handler #2013

Closed

svenklemm mentioned this issue Jun 30, 2020

Segfault in cagg_update_view_definition #2042

Closed

svenklemm mentioned this issue Aug 11, 2020

Assertion failure in DecompressChunk #2211

Closed

This was referenced Aug 28, 2020

member access within misaligned address in chunk_update_colstats #2271

Closed

Assertion failure in ts_subspace_store_init #2293

Closed

thobi85 mentioned this issue Jan 13, 2021

Make timescaledb 2.0.0 from source occurs error at catalog.c.o #2816

Closed

AlekseiSaff mentioned this issue Jul 25, 2021

postgres segmentation fault on query with "ORDER BY" #3439

Closed

lukepalmer mentioned this issue Oct 25, 2021

Segfault in compress_chunk -> GetActiveSnapshot #3740

Closed

vtkageyama mentioned this issue Feb 16, 2022

[Bug]: Failed process was running: VACUUM (VERBOSE, ANALYZE) _timescaledb_internal._compressed_hypertable_ #4100

Closed

alexanderlaw mentioned this issue Sep 14, 2023

Support for partial aggregations at chunk level #5596

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design/status/roadmap on the clustered version? #9

Design/status/roadmap on the clustered version? #9

blurrcat commented Mar 29, 2017

akulkarni commented Mar 29, 2017

archenroot commented Oct 11, 2017

akulkarni commented Oct 12, 2017

archenroot commented Oct 12, 2017 •

edited

akulkarni commented Oct 24, 2017

jrevault commented Nov 10, 2017

archenroot commented Nov 10, 2017

mfreed commented Nov 14, 2017

jrevault commented Nov 22, 2017

aleksve commented Apr 13, 2018

murugesan70 commented May 9, 2018

archenroot commented Nov 25, 2018

dianasaur323 commented Jan 10, 2019 •

edited

dianasaur323 commented Jul 16, 2019

Design/status/roadmap on the clustered version? #9

Design/status/roadmap on the clustered version? #9

Comments

blurrcat commented Mar 29, 2017

akulkarni commented Mar 29, 2017

archenroot commented Oct 11, 2017

akulkarni commented Oct 12, 2017

archenroot commented Oct 12, 2017 • edited

akulkarni commented Oct 24, 2017

jrevault commented Nov 10, 2017

archenroot commented Nov 10, 2017

mfreed commented Nov 14, 2017

jrevault commented Nov 22, 2017

aleksve commented Apr 13, 2018

murugesan70 commented May 9, 2018

archenroot commented Nov 25, 2018

dianasaur323 commented Jan 10, 2019 • edited

dianasaur323 commented Jul 16, 2019

archenroot commented Oct 12, 2017 •

edited

dianasaur323 commented Jan 10, 2019 •

edited