Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use numrows_pre_compression in approx row count #6365

Merged
merged 1 commit into from Dec 4, 2023

Conversation

nikkhils
Copy link
Contributor

@nikkhils nikkhils commented Nov 30, 2023

The approximate_row_count function was using the reltuples from compressed chunks and multiplying that with 1000 which is the default batch size. This was leading to a huge skew between the actual row count and the approximate one. We now use the numrows_pre_compression value from the timescaledb catalog which accurately represents the number of rows before the actual compression.

@nikkhils nikkhils self-assigned this Nov 30, 2023
Copy link

@erimatnor, @mahipv: please review this pull request.

Powered by pull-review

Copy link

codecov bot commented Nov 30, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ef030d2) 86.96% compared to head (e25a779) 82.41%.

❗ Current head e25a779 differs from pull request most recent head 6800284. Consider uploading reports for the commit 6800284 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6365      +/-   ##
==========================================
- Coverage   86.96%   82.41%   -4.56%     
==========================================
  Files         249      249              
  Lines       57966    57966              
  Branches    12903    12901       -2     
==========================================
- Hits        50411    47773    -2638     
- Misses       5176     6765    +1589     
- Partials     2379     3428    +1049     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@fabriziomello fabriziomello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Just one comment is don't reference SDC issues in this public repo because other users don't have access it, so either you create a correspondent issue on the public repository or give more details about the problem you're solving on the PR. :-)

@nikkhils
Copy link
Contributor Author

nikkhils commented Dec 1, 2023

LGTM.

Just one comment is don't reference SDC issues in this public repo because other users don't have access it, so either you create a correspondent issue on the public repository or give more details about the problem you're solving on the PR. :-)

Yeah, this is for the cross linking with the SDC to allow its auto closure mostly. It's removed from the original commit

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to the internal support case does not seem to be useful so you can remove it. Also wondering about the test coverage.

Comment on lines 1274 to +1293
SELECT approximate_row_count('stattest');
approximate_row_count
-----------------------
0
26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add a test that compares this with the actual number of rows counted explicitly. For cases where you have serial execution, you should get the same value.

The test also seems to be missing some cases where you have a mix of uncompressed and compressed rows in a chunk, so might be good to verify that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkindahl many additional tests added to this PR now

@nikkhils
Copy link
Contributor Author

nikkhils commented Dec 1, 2023

The reference to the internal support case does not seem to be useful so you can remove it. Also wondering about the test coverage.

removed

The approximate_row_count function was using the reltuples from
compressed chunks and multiplying that with 1000 which is the default
batch size. This was leading to a huge skew between the actual row
count and the approximate one. We now use the numrows_pre_compression
value from the timescaledb catalog which accurately represents the
number of rows before the actual compression.
@nikkhils nikkhils merged commit 293104a into timescale:main Dec 4, 2023
43 checks passed
@nikkhils nikkhils deleted the comp_rows branch December 4, 2023 16:57
@jnidzwetzki jnidzwetzki added the force-auto-backport Automatically backport this PR or fix of this issue, even if it's not marked as "bug" label Jan 3, 2024
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 3, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
@jnidzwetzki jnidzwetzki mentioned this pull request Jan 3, 2024
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 3, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 3, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 3, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* #6365 Use numrows_pre_compression in approximate row count
* #6377 Use processed group clauses in PG16
* #6384 Change bgw_log_level to use PGC_SUSET
* #6393 Disable vectorized sum for expressions.
* #6408 Fix groupby pathkeys for gapfill in PG16
* #6428 Fix index matching during DML decompression
* #6439 Fix compressed chunk permission handling on PG16
* #6443 Fix lost concurrent CAgg updates
* #6454 Fix unique expression indexes on compressed chunks
* #6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* #6365 Use numrows_pre_compression in approximate row count
* #6377 Use processed group clauses in PG16
* #6384 Change bgw_log_level to use PGC_SUSET
* #6393 Disable vectorized sum for expressions.
* #6405 Read CAgg watermark from materialized data
* #6408 Fix groupby pathkeys for gapfill in PG16
* #6428 Fix index matching during DML decompression
* #6439 Fix compressed chunk permission handling on PG16
* #6443 Fix lost concurrent CAgg updates
* #6454 Fix unique expression indexes on compressed chunks
* #6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported-2.13.x force-auto-backport Automatically backport this PR or fix of this issue, even if it's not marked as "bug"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants