Skip to content

Commit

Permalink
Deprecate adaptive chunking.
Browse files Browse the repository at this point in the history
The current implementation of adaptive chunking, as often employed by users,
is a failed experiment. The adaptation we introduced
does not work well on backfill scenarios and even slightly out-of-order
data and lead to unpredictable performance in many scenarios.
Since this feature was in BETA we decided to deprecate it and its current API
altogether to ensure overall stability in the system, with the potential
to revisit/reimplement in the future.
  • Loading branch information
cevian committed Jan 22, 2019
1 parent b1bbe96 commit b7df2de
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 233 deletions.
95 changes: 15 additions & 80 deletions api.md
Expand Up @@ -193,15 +193,15 @@ still work on the resulting hypertable.
|---|---|
| `partitioning_column` | Name of an additional column to partition by. If provided, the `number_partitions` argument must also be provided. |
| `number_partitions` | Number of hash partitions to use for `partitioning_column`. Must be > 0. |
| `chunk_time_interval` | Interval in event time that each chunk covers. Must be > 0. As of Timescale v0.11.0, default is 7 days, unless [adaptive chunking (BETA)][adaptive-chunking] is enabled, in which case the interval starts at 1 day. For previous versions, default is 1 month. |
| `chunk_time_interval` | Interval in event time that each chunk covers. Must be > 0. As of Timescale v0.11.0, default is 7 days, unless adaptive chunking (DEPRECATED) is enabled, in which case the interval starts at 1 day. For previous versions, default is 1 month. |
| `create_default_indexes` | Boolean whether to create default indexes on time/partitioning columns. Default is TRUE. |
| `if_not_exists` | Boolean whether to print warning if table already converted to hypertable or raise exception. Default is FALSE. |
| `partitioning_func` | The function to use for calculating a value's partition.|
| `associated_schema_name` | Name of the schema for internal hypertable tables. Default is "_timescaledb_internal". |
| `associated_table_prefix` | Prefix for internal hypertable chunk names. Default is "_hyper". |
| `migrate_data` | Set to `true` to migrate any existing `main_table` data to chunks in the new hypertable. A non-empty table will generate an error without this option. Note that, for large tables, the migration might take a long time. Defaults to false. |
| `chunk_target_size` | The target size of a chunk (including indexes) in `kB`, `MB`, `GB`, or `TB`. Setting this to `estimate` or a non-zero chunk size, e.g., `2GB` will enable [adaptive chunking (BETA)][adaptive-chunking]. The `estimate` setting will estimate a target chunk size based on system information. Adaptive chunking is disabled by default. |
| `chunk_sizing_func` | Allows setting a custom chunk sizing function for [adaptive chunking (BETA)][adaptive-chunking]. The built-in chunk sizing function will be used by default. Note that `chunk_target_size` needs to be set to use this function. |
| `chunk_target_size` | DEPRECATED - The target size of a chunk (including indexes) in `kB`, `MB`, `GB`, or `TB`. Setting this to `estimate` or a non-zero chunk size, e.g., `2GB` will enable adaptive chunking (a DEPRECATED feature). The `estimate` setting will estimate a target chunk size based on system information. Adaptive chunking is disabled by default. |
| `chunk_sizing_func` | DEPRECATED - Allows setting a custom chunk sizing function for adaptive chunking (a DEPRECATED feature). The built-in chunk sizing function will be used by default. Note that `chunk_target_size` needs to be set to use this function. |

#### Returns

Expand Down Expand Up @@ -253,21 +253,7 @@ millisecond, nanosecond, etc.). So if your time column is the number
of milliseconds since the UNIX epoch, and you wish to each chunk to
cover 1 day, you should specify `chunk_time_interval => 86400000`.

The units of `chunk_target_size` follow the format of a ["Numeric with
Unit"][memory-units] memory settings parameter in PostgreSQL. Note
that this size includes indexes when using the default adaptive
chunking (BETA) algorithm. Be careful with specifying a plain number
(`100000000`) rather than one with units (`100MB`), as the base unit
is the size of a disk block (typically 8k, in which case 100,000,000 *
8k ~ 800GB).

<!-- -->
>:TIP: Setting a reasonable initial `chunk_time_interval` is important
even with adaptive chunking (BETA) enabled, because it allows the adaptive
algorithm to more quickly reach the target chunk size. It is better to
set a too small `chunk_time_interval` as opposed to a too large
one. If no `chunk_time_interval` is set with adaptive chunking, the
default initial interval is 1 day.

In case of hash partitioning (i.e., `number_partitions` is greater
than zero), it is possible to optionally specify a custom partitioning
Expand All @@ -281,14 +267,10 @@ and return a positive `integer` hash value. Note that this hash value
is _not_ a partition ID, but rather the inserted value's position in
the dimension's key space, which is then divided across the partitions.

[Adaptive chunking (BETA)][adaptive-chunking] can be enabled by setting the
`chunk_target_size` to a non-zero human-readable value, e.g.,
`2GB`. The `chunk_target_size` should ideally not exceed the setting
of `shared_buffers`. It is also possible to set `chunk_target_size` to
`estimate`, in which case the system determines a suitable chunk
target size based on the PostgreSQL `shared_buffers` setting.
Optionally, `chunk_sizing_func` can also be set to use a custom
algorithm for adapting the chunk size.
The adaptive chunking feature is now deprecated and so we strongly
discourage people from using the `chunk_target_size` and
`chunk_sizing_func` parameters. These may be removed altogether in
the future.

<!-- -->
>:TIP: The `add_dimension` function can be used following hypertable
Expand Down Expand Up @@ -334,12 +316,6 @@ if `conditions` is already a hypertable:
SELECT create_hypertable('conditions', 'time', if_not_exists => TRUE);
```

Convert table `conditions` to hypertable with adaptive chunking
enabled and a chunk target size of `2GB`:
```sql
SELECT create_hypertable('conditions', 'time', chunk_target_size => '2GB');
```

#### Best Practices [](create_hypertable-best-practices)

Users of TimescaleDB often have two common questions:
Expand All @@ -352,16 +328,7 @@ the manual and automated adaption of its time intervals. With
manually-set intervals, users should specify a `chunk_time_interval`
when creating their hypertable (the default value is 1 week). The
interval used for new chunks can be changed by calling
[`set_chunk_time_interval()`](#set_chunk_time_interval). With
automatically adapted intervals (which are not enabled by default),
the user should specify a `chunk_target_size` and the chunk interval
will be adapted for future chunks with the specified
`chunk_time_interval` as a starting point, or 1 day if not
specified. The settings for adaptive chunking (BETA) can be changed by
calling [`set_adaptive_chunking()`](#set_adaptive_chunking).

The following instructions apply if users are choosing to configure
time intervals manually.
[`set_chunk_time_interval()`](#set_chunk_time_interval).

The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space
partitions) fit into memory. As such, we typically recommend setting
Expand Down Expand Up @@ -665,8 +632,11 @@ SELECT set_chunk_time_interval('conditions', 86400000);

---

## set_adaptive_chunking() [](set_adaptive_chunking)
Changes the settings for [adaptive chunking (BETA)][adaptive-chunking]. The
## DEPRECATED set_adaptive_chunking() [](set_adaptive_chunking)

>:WARNING: The adaptive chunking feature is now deprecated and should not be used.
Changes the settings for adaptive chunking. The
function returns the configured chunk sizing function and the target
chunk size in bytes. This change will impact how and when new chunks
are created; it does not modify the intervals of existing chunks.
Expand All @@ -676,47 +646,13 @@ are created; it does not modify the intervals of existing chunks.
|Name|Description|
|---|---|
| `hypertable` | Identifier of hypertable to update the settings for.|
| `chunk_target_size` | The target size of a chunk (including indexes) in `kB`, `MB`, `GB`, or `TB`. Setting this to `estimate` or a non-zero chunk size, e.g., `2GB` will enable [adaptive chunking (BETA)][adaptive-chunking]. The `estimate` setting will estimate a target chunk size based on system information. Adaptive chunking is disabled by default. |
| `chunk_target_size` | The target size of a chunk (including indexes) in `kB`, `MB`, `GB`, or `TB`. Setting this to `estimate` or a non-zero chunk size, e.g., `2GB` will enable adaptive chunking. The `estimate` setting will estimate a target chunk size based on system information. Adaptive chunking is disabled by default. |

#### Optional Arguments [](set_adaptive_chunking-optional-arguments)
| Name | Description |
|---|---|
| `chunk_sizing_func` | Allows setting a custom chunk sizing function for [adaptive chunking (BETA)][adaptive-chunking]. The built-in chunk sizing function will be used by default. Note that `chunk_target_size` needs to be set to use this function. |


#### Sample Usage [](set_adaptive_chunking-examples)
| `chunk_sizing_func` | Allows setting a custom chunk sizing function for adaptive chunking. The built-in chunk sizing function will be used by default. Note that `chunk_target_size` needs to be set to use this function. |

Enable adaptive chunking on hypertable `conditions` and estimate the
target chunk size based on system information:

```sql
SELECT * FROM set_adaptive_chunking('conditions', 'estimate');
chunk_sizing_func | chunk_target_size
------------------------------------------------+-------------------
_timescaledb_internal.calculate_chunk_interval | 536870912
```

Set the target chunk size to `1GB` on the `conditions` hypertable, using
a custom chunk sizing function (enabling adaptive chunking if
previously disabled):

```sql
SELECT * FROM set_adaptive_chunking('conditions', '1GB', 'custom_calculate_chunk_interval');
chunk_sizing_func | chunk_target_size
---------------------------------+-------------------
custom_calculate_chunk_interval | 1073741824

```

Disable adaptive chunking on the `conditions` hypertable, staying with the
current chunk time interval.

```sql
SELECT * FROM set_adaptive_chunking('conditions', 'off');
chunk_sizing_func | chunk_target_size
------------------------------------------------+-------------------
_timescaledb_internal.calculate_chunk_interval | 0
```

---

Expand Down Expand Up @@ -1740,7 +1676,6 @@ and then inspect `dump_file.txt` before sending it together with a bug report or
[postgres-tablespaces]: https://www.postgresql.org/docs/9.6/static/manage-ag-tablespaces.html
[postgres-createtablespace]: https://www.postgresql.org/docs/9.6/static/sql-createtablespace.html
[migrate-from-postgresql]: /getting-started/migrating-data
[adaptive-chunking]: /using-timescaledb/adaptive-chunking
[memory-units]: https://www.postgresql.org/docs/current/static/config-setting.html#CONFIG-SETTING-NAMES-VALUES
[telemetry]: /using-timescaledb/telemetry
[drop chunks]: #drop_chunks
Expand Down
5 changes: 2 additions & 3 deletions faq.md
Expand Up @@ -180,8 +180,7 @@ Read our TimescaleDB-PostgreSQL benchmarks:

To summarize, TimescaleDB offers:
* Ease-of-use: TimescaleDB is far easier to use because creating partitions (or what we call
"chunks") is automatically performed for the user. Chunks will also be adaptively sized by
the system as data rates and volumes fluctuate. All of the complexity of automatic, adaptive
"chunks") is automatically performed for the user. All of the complexity of automatic
partitioning is abstracted away behind a "hypertable", which users interact with just as
they would with a PostgreSQL table.
* Much higher ingest scale: TimescaleDB sees throughput more than 20X that of
Expand All @@ -192,7 +191,7 @@ when running on a single server. In particular, vanilla PostgreSQL has poor writ
for moderate tables, and this problem only becomes worse over time as data volume grows
linearly in time. These problems emerge when table indexes can no longer fit in memory,
as each insert will translate to many disk fetches to swap in portions of the indexes'
B-Trees. TimescaleDB solves this through its heavy (and adaptive) utilization of
B-Trees. TimescaleDB solves this through its heavy utilization of
time-space partitioning, even when running _on a single machine_. So all writes
to recent time intervals are only to tables that remain in memory, and updating any
secondary indexes is also fast as a result.
Expand Down
147 changes: 0 additions & 147 deletions using-timescaledb/adaptive-chunking.md

This file was deleted.

3 changes: 0 additions & 3 deletions using-timescaledb/hypertables.md
Expand Up @@ -82,8 +82,6 @@ and 1 month prior to v0.11.0.
Alternatively, users can explicitly configure time intervals by
setting `chunk_time_interval` when creating a hypertable.
After the hypertable has been created, the interval used for new chunks can be changed by calling `set_chunk_time_interval`.
Furthermore, as of v0.11.0, users can enable [adaptive chunking (BETA)][adaptive-chunking],
which will automatically set chunk sizes under-the-hood, based on insert patterns.

The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space
partitions) fit into memory. As such, we typically recommend setting
Expand Down Expand Up @@ -160,4 +158,3 @@ queries.
[postgres-createtable]: https://www.postgresql.org/docs/9.6/static/sql-createtable.html
[create_hypertable]: /api#create_hypertable
[migrate-from-postgresql]: /getting-started/migrating-data
[adaptive-chunking]: /using-timescaledb/adaptive-chunking

0 comments on commit b7df2de

Please sign in to comment.