Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: error compressing wide table #4398

Closed
justinpryzby opened this issue May 29, 2022 · 8 comments · Fixed by #4696
Closed

[Bug]: error compressing wide table #4398

justinpryzby opened this issue May 29, 2022 · 8 comments · Fixed by #4696
Assignees

Comments

@justinpryzby
Copy link

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Compression

What happened?

We have wide tables (with up to ~1000 columns).
When I try to compress a table, it fails like:

TimescaleDB version affected

2.7

PostgreSQL version used

14.3

What operating system did you use?

centos7

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

postgres=# SELECT compress_chunk(c) FROM show_chunks('widetable', INTERVAL '1 days') c;
2022-05-28 23:19:20.076 MST [10906] ERROR:  row is too big: size 46616, maximum size 8160
ERROR:  row is too big: size 46616, maximum size 8160
Time: 74773.753 ms (01:14.774)
postgres=# \errverbose 
ERROR:  54000: row is too big: size 46616, maximum size 8160
LOCATION:  RelationGetBufferForTuple, hio.c:361

Here's an old backtrace from when I sent this problem to priscila@timescale.com on 26 May 2020 asking that it be forwarded to developers.

> > > (gdb) bt
> > > #1  0x000000000087e440 in errfinish ()
> > > #2  0x00000000004cbb2d in RelationGetBufferForTuple ()
> > > #3  0x00000000004c01d8 in heap_insert ()
> > > #4  0x00007f2e6c22d294 in row_compressor_flush (row_compressor=0x7ffe20544010, mycid=561, changed_groups=true)
> > >     at /home/pryzbyj/timescaledb/tsl/src/compression/compression.c:784
> > > #5  0x00007f2e6c22e0fe in row_compressor_append_sorted_rows (sorted_desc=<optimized out>, sorted_rel=0x113f140, row_compressor=0x7ffe20544010)
> > >     at /home/pryzbyj/timescaledb/tsl/src/compression/compression.c:600
> > > #6  compress_chunk (in_table=<optimized out>, out_table=<optimized out>, column_compression_info=column_compression_info@entry=0x3933910, 
> > >     num_compression_infos=num_compression_infos@entry=536) at /home/pryzbyj/timescaledb/tsl/src/compression/compression.c:260
> > > #7  0x00007f2e6c230fbe in compress_chunk_impl (chunk_relid=19174912, hypertable_relid=<optimized out>)
> > >     at /home/pryzbyj/timescaledb/tsl/src/compression/compress_utils.c:232
> > > #8  tsl_compress_chunk_wrapper (chunk_relid=chunk_relid@entry=84253857, if_not_compressed=<optimized out>)
> > >     at /home/pryzbyj/timescaledb/tsl/src/compression/compress_utils.c:325
> > > #9  0x00007f2e6c23134e in tsl_compress_chunk (fcinfo=0x11d0ba0) at /home/pryzbyj/timescaledb/tsl/src/compression/compress_utils.c:334

How can we reproduce the bug?

No response

@svenklemm
Copy link
Member

It seems like you are running into postgres limitations here. The 8160 byte row limit is the maximum size a row can have and short of recompiling postgres with a different page size there is no way around. With > 1000 columns the likelyhood of hitting this is pretty high assuming average field size is 8 bytes. But postgres supports storing data out of line to handle data > 8kb so I'm wondering why this is not done here. Did you change the storage options for the fields?

@justinpryzby
Copy link
Author

justinpryzby commented May 29, 2022 via email

@justinpryzby
Copy link
Author

justinpryzby commented May 29, 2022 via email

@sb230132
Copy link
Contributor

sb230132 commented Aug 10, 2022

Below is the explanation for what is happening.

Consider this example:

DROP TABLE foo_ts;
CREATE TABLE foo_ts (
      tm timestamp with time zone,
      c1 integer default 1234,
      c2 integer default 1234);
SELECT * FROM create_hypertable('foo_ts', 'tm');
ALTER TABLE foo_ts SET (timescaledb.compress, timescaledb.compress_segmentby='c1',
                        timescaledb.compress_orderby='tm');
INSERT INTO foo_ts SELECT now() + INTERVAL '11' DAY,
                          generate_series(1,10),
                          generate_series(1,10);
SELECT compress_chunk(c) FROM show_chunks('foo_ts') c;

Consider above scenario where we create a hypertable and enable compression on that table. When compress_chunk() is called below are list of things happening:

  1. postgres will first sort all the tuples based on column 'tm'.
  2. postgres calculates tuple length of compressed table. Tuple length comes to approximately (45 (tm column) + 4
    (c1 column) + 45 (c2 column)) + header length which is 24 byte. For now iam ignoring byte alignment.
  3. timescaledb also adds 4 metadata columns which is about 24 bytes in length.
  4. Total tuple length will comes to approximately 148bytes.
  5. In heap_insert() where final data is written to files, postgres checks if compressed table tuple can be toasted or not. In this case since tuple length is less than 8160bytes (max tuple length), toasting will not happen.
  6. All data is written successfully to the compressed table.

Now assume above table 'foo_ts' has 600 columns(tm, c1, c2 ... c599).
Total tuple length of compressed table will be (45 * 599) for columns tm, c2, c3, c4 .... c599 + 24bytes header + 24bytes metadata. This will exceed max tuple length limit of 8160.
Postgres server will try to toast the tuple. Toasted tuple length is calculated as (18 * 599) + 4bytes for segment by column + 24 bytes header + 24bytes metadata. Total length comes to around 10834bytes (approximately).
Toasted tuple length is still greater than max tuple length supported by postgres. Thus we see the error as
ERROR: row is too big: size 10856, maximum size 8160

Tested on:
postgres version 14.0
timescaledb version 2.8.0

Hi Justin,
Thanks for reproducible testcase.
You mentioned that you don't see any issue with vanilla postgres. Can you please share a similar testcase which i can run on postgres(without timescaledb) to better understand on what is happening. This will give me more information to decide on if this is a bug or an expected behaviour.

@justinpryzby
Copy link
Author

justinpryzby commented Aug 10, 2022 via email

@sb230132
Copy link
Contributor

sb230132 commented Aug 11, 2022

Hi Justin,

Consider simple table like below:

CREATE TABLE foo_ts (
      tm timestamp with time zone,
      c1 integer default 1234,
      c2 char(10) default 'char(10)',
      c3 varchar(30) default 'varchar(30)',
      c4 date default '01-02-2022');

tsdb=# \d+ foo_ts
                                                               Table "public.foo_ts"
 Column |           Type           | Collation | Nullable |             Default              | Storage  | Compression | Stats target | Description 
--------+--------------------------+-----------+----------+----------------------------------+----------+-------------+--------------+-------------
 tm     | timestamp with time zone |           |          |                                  | plain    |             |              | 
 c1     | integer                  |           |          | 1234                             | plain    |             |              | 
 c2     | character(10)            |           |          | 'char(10)'::bpchar               | extended |             |              | 
 c3     | character varying(30)    |           |          | 'varchar(30)'::character varying | extended |             |              | 
 c4     | date                     |           |          | '2022-01-02'::date               | plain    |             |              | 
Access method: heap

Please see that storage for char and varchar is set to 'extended'. It means values for these attributes(aka columns) can be toasted and compressed as well by postgres.

postgres=# select * FROM _timescaledb_catalog.hypertable;
(0 rows)

There is no internal timescaledb tables created yet.

Now create hypertable and enable compression on the table.

postgres=# 
postgres=# SELECT * FROM create_hypertable('foo_ts', 'tm');
NOTICE:  adding not-null constraint to column "tm"
DETAIL:  Time dimensions cannot have NULL values.
-[ RECORD 1 ]-+-------
hypertable_id | 84
schema_name   | public
table_name    | foo_ts
created       | t

postgres=# ALTER TABLE foo_ts SET (timescaledb.compress, timescaledb.compress_segmentby='c1', timescaledb.compress_orderby='tm');
ALTER TABLE

postgres=# select * FROM _timescaledb_catalog.hypertable;
-[ RECORD 1 ]------------+--------------------------
id                       | 85
schema_name              | _timescaledb_internal
table_name               | _compressed_hypertable_85
associated_schema_name   | _timescaledb_internal
associated_table_prefix  | _hyper_85
num_dimensions           | 0
chunk_sizing_func_schema | _timescaledb_internal
chunk_sizing_func_name   | calculate_chunk_interval
chunk_target_size        | 0
compression_state        | 2
compressed_hypertable_id | 
replication_factor       | 
-[ RECORD 2 ]------------+--------------------------
id                       | 84
schema_name              | public
table_name               | foo_ts
associated_schema_name   | _timescaledb_internal
associated_table_prefix  | _hyper_84
num_dimensions           | 1
chunk_sizing_func_schema | _timescaledb_internal
chunk_sizing_func_name   | calculate_chunk_interval
chunk_target_size        | 0
compression_state        | 1
compressed_hypertable_id | 85
replication_factor       | 

If you notice there is internal timescaledb compression table ('_timescaledb_internal._compressed_hypertable_85') created.
When you compress a chunk, timescaledb compresses the tuples and tries to insert into this compressed table. Check below the definition of this compressed table.

tsdb=# \d+ _timescaledb_internal._compressed_hypertable_85
                                                Table "_timescaledb_internal._compressed_hypertable_85"
        Column         |                 Type                  | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-----------------------+---------------------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 tm                    | _timescaledb_internal.compressed_data |           |          |         | external |             | 0            | 
 c1                    | integer                               |           |          |         | plain    |             | 1000         | 
 c2                    | _timescaledb_internal.compressed_data |           |          |         | extended |             | 0            | 
 c3                    | _timescaledb_internal.compressed_data |           |          |         | extended |             | 0            | 
 c4                    | _timescaledb_internal.compressed_data |           |          |         | external |             | 0            | 
 _ts_meta_count        | integer                               |           |          |         | plain    |             | 1000         | 
 _ts_meta_sequence_num | integer                               |           |          |         | plain    |             | 1000         | 
 _ts_meta_min_1        | timestamp with time zone              |           |          |         | plain    |             | 1000         | 
 _ts_meta_max_1        | timestamp with time zone              |           |          |         | plain    |             | 1000         | 
Indexes:
    "_compressed_hypertable_85_c1__ts_meta_sequence_num_idx" btree (c1, _ts_meta_sequence_num)
Triggers:
    ts_insert_blocker BEFORE INSERT ON _timescaledb_internal._compressed_hypertable_85 FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Access method: heap
Options: toast_tuple_target=128

Please notice the storage qualifier for columns tm, c2, c3, c4. External means attribute can be toasted but not compressed.
To simulate the problem we need to create a table with columns whose storage qualifier needs to be external or extended.
Thus i was asking you for a testcase. Anyways i got a testcase where without timescaledb i can reproduce the issue.

Please create table like below with 460 columns:

  CREATE TABLE foo_ts (
      c1 char(25) default '01-02-2022',
      c2 char(25) default '01-02-2022',
      c3 char(25) default '01-02-2022',
     .......
      c459 char(25) default '01-02-2022',
      c460 char(25) default '01-02-2022'
  );

Now do an insert, it will fail with same error.

postgres=# insert into foo_ts(c1) values ('hello');
ERROR:  row is too big: size 8304, maximum size 8160

In short we are trying to hit hard limits set by postgres. Thus IMHO this is an expected behaviour.

@sb230132 sb230132 self-assigned this Aug 23, 2022
@justinpryzby
Copy link
Author

justinpryzby commented Aug 24, 2022 via email

@sb230132
Copy link
Contributor

Hi Justin,
You seem to have a valid proposal. Reporting a warning and documenting the limits for compressed hypertable can be done.
Will come up with a solution.
Thanks.

sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 13, 2022
Consider a compressed hypertable has many columns (like more than 600 columns). In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes error
as row is too big: size 10856, maximum size 8160.

This patch estimates the tuple size of compressed hypertable and reports a warning when compression is enabled on hypertable. Thus user gets aware of this warning before calling compress_chunk().

Fixes timescale#4398
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 13, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 13, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 13, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 15, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 15, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398

Co-Authored-By: Mats Kindahl <mats.kindahl@gmail.com>
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 15, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398

Co-Authored-By: Mats Kindahl <mats.kindahl@gmail.com>
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 15, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398

Co-Authored-By: Mats Kindahl <mats.kindahl@gmail.com>
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 15, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398

Co-Authored-By: Mats Kindahl <mats.kindahl@gmail.com>
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 16, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398

Co-Authored-By: Mats Kindahl <mats.kindahl@gmail.com>
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 16, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 16, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
sb230132 added a commit to sb230132/timescaledb that referenced this issue Sep 17, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
sb230132 added a commit that referenced this issue Sep 17, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes #4398
jnidzwetzki pushed a commit to jnidzwetzki/timescaledb that referenced this issue Sep 30, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
jnidzwetzki pushed a commit to jnidzwetzki/timescaledb that referenced this issue Sep 30, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
jnidzwetzki pushed a commit to jnidzwetzki/timescaledb that referenced this issue Sep 30, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
jnidzwetzki pushed a commit to jnidzwetzki/timescaledb that referenced this issue Oct 4, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes timescale#4398
jnidzwetzki pushed a commit that referenced this issue Oct 6, 2022
Consider a compressed hypertable has many columns (like more than 600 columns).
In call to compress_chunk(), the compressed tuple size exceeds, 8K which causes
error as "row is too big: size 10856, maximum size 8160."

This patch estimates the tuple size of compressed hypertable and reports a
warning when compression is enabled on hypertable. Thus user gets aware of
this warning before calling compress_chunk().

Fixes #4398
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants