Deprecate adaptive chunking.

The current implementation of adaptive chunking, as often employed by users, is a failed experiment. The adaptation we introduced does not work well on backfill scenarios and even slightly out-of-order data and lead to unpredictable performance in many scenarios. Since this feature was in BETA we decided to deprecate it and its current API altogether to ensure overall stability in the system, with the potential to revisit/reimplement in the future.
timescale · Jan 22, 2019 · b7df2de · b7df2de
1 parent b1bbe96
commit b7df2de
Show file tree

Hide file tree

Showing 4 changed files with 17 additions and 233 deletions.
diff --git a/api.md b/api.md
@@ -193,15 +193,15 @@ still work on the resulting hypertable.
 |---|---|
 | `partitioning_column` | Name of an additional column to partition by. If provided, the `number_partitions` argument must also be provided. |
 | `number_partitions` | Number of hash partitions to use for `partitioning_column`. Must be > 0. |
-| `chunk_time_interval` | Interval in event time that each chunk covers. Must be > 0. As of Timescale v0.11.0, default is 7 days, unless [adaptive chunking (BETA)][adaptive-chunking] is enabled, in which case the interval starts at 1 day. For previous versions, default is 1 month. |
+| `chunk_time_interval` | Interval in event time that each chunk covers. Must be > 0. As of Timescale v0.11.0, default is 7 days, unless adaptive chunking (DEPRECATED)  is enabled, in which case the interval starts at 1 day. For previous versions, default is 1 month. |
 | `create_default_indexes` | Boolean whether to create default indexes on time/partitioning columns. Default is TRUE. |
 | `if_not_exists` | Boolean whether to print warning if table already converted to hypertable or raise exception. Default is FALSE. |
 | `partitioning_func` | The function to use for calculating a value's partition.|
 | `associated_schema_name` | Name of the schema for internal hypertable tables. Default is "_timescaledb_internal". |
 | `associated_table_prefix` | Prefix for internal hypertable chunk names. Default is "_hyper". |
 | `migrate_data` | Set to `true` to migrate any existing `main_table` data to chunks in the new hypertable. A non-empty table will generate an error without this option. Note that, for large tables, the migration might take a long time. Defaults to false. |
-| `chunk_target_size` | The target size of a chunk (including indexes) in `kB`, `MB`, `GB`, or `TB`. Setting this to `estimate` or a non-zero chunk size, e.g., `2GB` will enable [adaptive chunking (BETA)][adaptive-chunking]. The `estimate` setting will estimate a target chunk size based on system information. Adaptive chunking is disabled by default. |
-| `chunk_sizing_func` | Allows setting a custom chunk sizing function for [adaptive chunking (BETA)][adaptive-chunking]. The built-in chunk sizing function will be used by default. Note that `chunk_target_size` needs to be set to use this function.  |
+| `chunk_target_size` | DEPRECATED - The target size of a chunk (including indexes) in `kB`, `MB`, `GB`, or `TB`. Setting this to `estimate` or a non-zero chunk size, e.g., `2GB` will enable adaptive chunking (a DEPRECATED feature). The `estimate` setting will estimate a target chunk size based on system information. Adaptive chunking is disabled by default. |
+| `chunk_sizing_func` | DEPRECATED - Allows setting a custom chunk sizing function for adaptive chunking (a DEPRECATED feature). The built-in chunk sizing function will be used by default. Note that `chunk_target_size` needs to be set to use this function.  |
 
 #### Returns
 
@@ -253,21 +253,7 @@ millisecond, nanosecond, etc.).  So if your time column is the number
 of milliseconds since the UNIX epoch, and you wish to each chunk to
 cover 1 day, you should specify `chunk_time_interval => 86400000`.
 
-The units of `chunk_target_size` follow the format of a ["Numeric with
-Unit"][memory-units] memory settings parameter in PostgreSQL. Note
-that this size includes indexes when using the default adaptive
-chunking (BETA) algorithm. Be careful with specifying a plain number
-(`100000000`) rather than one with units (`100MB`), as the base unit
-is the size of a disk block (typically 8k, in which case 100,000,000 *
-8k ~ 800GB).
-
 <!-- -->
->:TIP: Setting a reasonable initial `chunk_time_interval` is important
-even with adaptive chunking (BETA) enabled, because it allows the adaptive
-algorithm to more quickly reach the target chunk size. It is better to
-set a too small `chunk_time_interval` as opposed to a too large
-one. If no `chunk_time_interval` is set with adaptive chunking, the
-default initial interval is 1 day.
 
 In case of hash partitioning (i.e., `number_partitions` is greater
 than zero), it is possible to optionally specify a custom partitioning
@@ -281,14 +267,10 @@ and return a positive `integer` hash value. Note that this hash value
 is _not_ a partition ID, but rather the inserted value's position in
 the dimension's key space, which is then divided across the partitions.
 
-[Adaptive chunking (BETA)][adaptive-chunking] can be enabled by setting the
-`chunk_target_size` to a non-zero human-readable value, e.g.,
-`2GB`. The `chunk_target_size` should ideally not exceed the setting
-of `shared_buffers`. It is also possible to set `chunk_target_size` to
-`estimate`, in which case the system determines a suitable chunk
-target size based on the PostgreSQL `shared_buffers` setting.
-Optionally, `chunk_sizing_func` can also be set to use a custom
-algorithm for adapting the chunk size.
+The adaptive chunking feature is now deprecated and so we strongly
+discourage people from using the `chunk_target_size` and 
+`chunk_sizing_func` parameters. These may be removed altogether in
+the future.
 
 <!-- -->
 >:TIP: The `add_dimension` function can be used following hypertable
@@ -334,12 +316,6 @@ if `conditions` is already a hypertable:
 SELECT create_hypertable('conditions', 'time', if_not_exists => TRUE);
 ```
 
-Convert table `conditions` to hypertable with adaptive chunking
-enabled and a chunk target size of `2GB`:
-```sql
-SELECT create_hypertable('conditions', 'time', chunk_target_size => '2GB');
-```
-
 #### Best Practices [](create_hypertable-best-practices)
 
 Users of TimescaleDB often have two common questions:
@@ -352,16 +328,7 @@ the manual and automated adaption of its time intervals. With
 manually-set intervals, users should specify a `chunk_time_interval`
 when creating their hypertable (the default value is 1 week). The
 interval used for new chunks can be changed by calling
-[`set_chunk_time_interval()`](#set_chunk_time_interval). With
-automatically adapted intervals (which are not enabled by default),
-the user should specify a `chunk_target_size` and the chunk interval
-will be adapted for future chunks with the specified
-`chunk_time_interval` as a starting point, or 1 day if not
-specified. The settings for adaptive chunking (BETA) can be changed by
-calling [`set_adaptive_chunking()`](#set_adaptive_chunking).
-
-The following instructions apply if users are choosing to configure
-time intervals manually.
+[`set_chunk_time_interval()`](#set_chunk_time_interval).
 
 The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space
 partitions) fit into memory.  As such, we typically recommend setting
@@ -665,8 +632,11 @@ SELECT set_chunk_time_interval('conditions', 86400000);
 
 ---
 
-## set_adaptive_chunking() [](set_adaptive_chunking)
-Changes the settings for [adaptive chunking (BETA)][adaptive-chunking]. The
+## DEPRECATED set_adaptive_chunking() [](set_adaptive_chunking)
+
+>:WARNING: The adaptive chunking feature is now deprecated and should not be used.
+
+Changes the settings for adaptive chunking. The
 function returns the configured chunk sizing function and the target
 chunk size in bytes. This change will impact how and when new chunks
 are created; it does not modify the intervals of existing chunks.
@@ -676,47 +646,13 @@ are created; it does not modify the intervals of existing chunks.
 |Name|Description|
 |---|---|
 | `hypertable` | Identifier of hypertable to update the settings for.|
-| `chunk_target_size` | The target size of a chunk (including indexes) in `kB`, `MB`, `GB`, or `TB`. Setting this to `estimate` or a non-zero chunk size, e.g., `2GB` will enable [adaptive chunking (BETA)][adaptive-chunking]. The `estimate` setting will estimate a target chunk size based on system information. Adaptive chunking is disabled by default. |
+| `chunk_target_size` | The target size of a chunk (including indexes) in `kB`, `MB`, `GB`, or `TB`. Setting this to `estimate` or a non-zero chunk size, e.g., `2GB` will enable adaptive chunking. The `estimate` setting will estimate a target chunk size based on system information. Adaptive chunking is disabled by default. |
 
 #### Optional Arguments [](set_adaptive_chunking-optional-arguments)
 | Name | Description |
 |---|---|
-| `chunk_sizing_func` | Allows setting a custom chunk sizing function for [adaptive chunking (BETA)][adaptive-chunking]. The built-in chunk sizing function will be used by default. Note that `chunk_target_size` needs to be set to use this function. |
-
-
-#### Sample Usage [](set_adaptive_chunking-examples)
+| `chunk_sizing_func` | Allows setting a custom chunk sizing function for adaptive chunking. The built-in chunk sizing function will be used by default. Note that `chunk_target_size` needs to be set to use this function. |
 
-Enable adaptive chunking on hypertable `conditions` and estimate the
-target chunk size based on system information:
-
-```sql
-SELECT * FROM set_adaptive_chunking('conditions', 'estimate');
-               chunk_sizing_func                | chunk_target_size
-------------------------------------------------+-------------------
- _timescaledb_internal.calculate_chunk_interval |         536870912
-```
-
-Set the target chunk size to `1GB` on the `conditions` hypertable, using
-a custom chunk sizing function (enabling adaptive chunking if
-previously disabled):
-
-```sql
-SELECT * FROM set_adaptive_chunking('conditions', '1GB', 'custom_calculate_chunk_interval');
-    chunk_sizing_func            | chunk_target_size
----------------------------------+-------------------
- custom_calculate_chunk_interval |        1073741824
-
-```
-
-Disable adaptive chunking on the `conditions` hypertable, staying with the
-current chunk time interval.
-
-```sql
-SELECT * FROM set_adaptive_chunking('conditions', 'off');
-               chunk_sizing_func                | chunk_target_size
-------------------------------------------------+-------------------
- _timescaledb_internal.calculate_chunk_interval |                 0
-```
 
 ---
 
@@ -1740,7 +1676,6 @@ and then inspect `dump_file.txt` before sending it together with a bug report or
 [postgres-tablespaces]: https://www.postgresql.org/docs/9.6/static/manage-ag-tablespaces.html
 [postgres-createtablespace]: https://www.postgresql.org/docs/9.6/static/sql-createtablespace.html
 [migrate-from-postgresql]: /getting-started/migrating-data
-[adaptive-chunking]: /using-timescaledb/adaptive-chunking
 [memory-units]: https://www.postgresql.org/docs/current/static/config-setting.html#CONFIG-SETTING-NAMES-VALUES
 [telemetry]: /using-timescaledb/telemetry
 [drop chunks]: #drop_chunks

diff --git a/faq.md b/faq.md
@@ -180,8 +180,7 @@ Read our TimescaleDB-PostgreSQL benchmarks:
 
 To summarize, TimescaleDB offers:
 * Ease-of-use: TimescaleDB is far easier to use because creating partitions (or what we call
-"chunks") is automatically performed for the user. Chunks will also be adaptively sized by
-the system as data rates and volumes fluctuate. All of the complexity of automatic, adaptive
+"chunks") is automatically performed for the user.  All of the complexity of automatic
 partitioning is abstracted away behind a "hypertable", which users interact with just as
 they would with a PostgreSQL table.
 * Much higher ingest scale: TimescaleDB sees throughput more than 20X that of
@@ -192,7 +191,7 @@ when running on a single server. In particular, vanilla PostgreSQL has poor writ
 for moderate tables, and this problem only becomes worse over time as data volume grows
 linearly in time. These problems emerge when table indexes can no longer fit in memory,
 as each insert will translate to many disk fetches to swap in portions of the indexes'
-B-Trees. TimescaleDB solves this through its heavy (and adaptive) utilization of
+B-Trees. TimescaleDB solves this through its heavy utilization of
 time-space partitioning, even when running _on a single machine_. So all writes
 to recent time intervals are only to tables that remain in memory, and updating any
 secondary indexes is also fast as a result.

diff --git a/using-timescaledb/adaptive-chunking.md b/using-timescaledb/adaptive-chunking.md
diff --git a/using-timescaledb/hypertables.md b/using-timescaledb/hypertables.md
@@ -82,8 +82,6 @@ and 1 month prior to v0.11.0.
 Alternatively, users can explicitly configure time intervals by
 setting `chunk_time_interval` when creating a hypertable.
 After the hypertable has been created, the interval used for new chunks can be changed by calling `set_chunk_time_interval`.
-Furthermore, as of v0.11.0, users can enable [adaptive chunking (BETA)][adaptive-chunking],
-which will automatically set chunk sizes under-the-hood, based on insert patterns.
 
 The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space
 partitions) fit into memory.  As such, we typically recommend setting
@@ -160,4 +158,3 @@ queries.
 [postgres-createtable]: https://www.postgresql.org/docs/9.6/static/sql-createtable.html
 [create_hypertable]: /api#create_hypertable
 [migrate-from-postgresql]: /getting-started/migrating-data
-[adaptive-chunking]: /using-timescaledb/adaptive-chunking