Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill #4641

Merged
merged 2 commits into from Aug 22, 2022

Conversation

svenklemm
Copy link
Member

@svenklemm svenklemm commented Aug 22, 2022

This patch allows bucketing by month for time_bucket with date,
timestamp or timestamptz. When bucketing by month the interval
must only contain month components.

Fixes #4005

Disable-check: commit-count

@codecov
Copy link

codecov bot commented Aug 22, 2022

Codecov Report

Merging #4641 (33ec376) into main (1f6d697) will decrease coverage by 0.04%.
The diff coverage is 92.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4641      +/-   ##
==========================================
- Coverage   90.77%   90.72%   -0.05%     
==========================================
  Files         224      224              
  Lines       41915    41776     -139     
==========================================
- Hits        38047    37902     -145     
- Misses       3868     3874       +6     
Impacted Files Coverage Δ
src/cross_module_fn.c 67.87% <0.00%> (-2.57%) ⬇️
src/time_bucket.c 98.25% <100.00%> (-0.16%) ⬇️
tsl/src/nodes/gapfill/exec.c 97.08% <100.00%> (+0.06%) ⬆️
tsl/src/remote/dist_ddl.c 95.82% <100.00%> (-0.02%) ⬇️
src/partitioning.c 85.25% <0.00%> (-3.85%) ⬇️
src/loader/bgw_message_queue.c 85.52% <0.00%> (-2.64%) ⬇️
src/import/planner.c 62.50% <0.00%> (-1.81%) ⬇️
src/bgw/scheduler.c 82.63% <0.00%> (-1.15%) ⬇️
src/nodes/chunk_dispatch_state.c 94.59% <0.00%> (-0.59%) ⬇️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82fc2ca...33ec376. Read the comment docs.

@svenklemm svenklemm force-pushed the time_bucket_month branch 5 times, most recently from 63220eb to 17c9d55 Compare August 22, 2022 08:17
@svenklemm svenklemm changed the title Allow bucketing by month in time_bucket Allow bucketing by month in time_bucket and time_bucket_gapfill Aug 22, 2022
Copy link
Member

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the gapfill tests are missing.

Copy link
Contributor

@gayyappan gayyappan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments.

}
}

static DateADT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some highlevel comments about this function?

return interval_to_usec(DatumGetIntervalP(arg));
Interval *interval_arg = DatumGetIntervalP(arg);
if (interval_arg->month)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a test for gapfill with months?

@@ -96,6 +96,7 @@ typedef struct GapFillState
int64 gapfill_start;
int64 gapfill_end;
int64 gapfill_period;
Interval *gapfill_interval;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add a comment about how gapfill_period and gapfill_interval are used.

Copy link
Contributor

@konskov konskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error message in get_interval_period_timestamp_units ("interval defined in terms of month, year, century etc. not supported") can now be updated by removing the "month" part

@svenklemm svenklemm added this to the TimescaleDB 2.8 milestone Aug 22, 2022
@svenklemm svenklemm force-pushed the time_bucket_month branch 2 times, most recently from fd76ba2 to 7f556a9 Compare August 22, 2022 16:10
This patch allows bucketing by month for time_bucket with date,
timestamp or timestamptz. When bucketing by month the interval
must only contain month components. When using origin together
with bucketing by month only the year and month components are
honoured.

To bucket by month we get the year and month of a date and convert
that to the nth month since origin. This allows us to treat month
bucketing similar to int bucketing. During this process we ignore
the day component and therefore only support bucketing by full months.
@svenklemm svenklemm changed the title Allow bucketing by month in time_bucket and time_bucket_gapfill Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill Aug 22, 2022
@svenklemm svenklemm merged commit 1c0bf4b into timescale:main Aug 22, 2022
svenklemm added a commit to svenklemm/timescaledb that referenced this pull request Aug 31, 2022
This release adds major new features since the 2.7.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* time_bucket now supports bucketing by month, year and timezone
* Improve performance of bulk SELECT and COPY for distributed hypertables
* 1 step CAgg policy management
* Migrate Continuous Aggregates to the new format

**Features**
* timescale#4188 Use COPY protocol in row-by-row fetcher
* timescale#4307 Mark partialize_agg as parallel safe
* timescale#4380 Enable chunk exclusion for space dimensions in UPDATE/DELETE
* timescale#4384 Add schedule_interval to policies
* timescale#4390 Faster lookup of chunks by point
* timescale#4393 Support intervals with day component when constifying now()
* timescale#4397 Support intervals with month component when constifying now()
* timescale#4405 Support ON CONFLICT ON CONSTRAINT for hypertables
* timescale#4412 Add telemetry about replication
* timescale#4415 Drop remote data when detaching data node
* timescale#4416 Handle TRUNCATE TABLE on chunks
* timescale#4425 Add parameter check_config to alter_job
* timescale#4430 Create index on Continuous Aggregates
* timescale#4439 Allow ORDER BY on continuous aggregates
* timescale#4443 Add stateful partition mappings
* timescale#4484 Use non-blocking data node connections for COPY
* timescale#4495 Support add_dimension() with existing data
* timescale#4502 Add chunks to baserel cache on chunk exclusion
* timescale#4545 Add hypertable distributed argument and defaults
* timescale#4552 Migrate Continuous Aggregates to the new format
* timescale#4556 Add runtime exclusion for hypertables
* timescale#4561 Change get_git_commit to return full commit hash
* timescale#4563 1 step CAgg policy management
* timescale#4641 Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill
* timescale#4642 Add timezone support to time_bucket

**Bugfixes**
* timescale#4359 Create composite index on segmentby columns
* timescale#4374 Remove constified now() constraints from plan
* timescale#4416 Handle TRUNCATE TABLE on chunks
* timescale#4478 Synchronize chunk cache sizes
* timescale#4486 Adding boolean column with default value doesn't work on compressed table
* timescale#4512 Fix unaligned pointer access
* timescale#4519 Throw better error message on incompatible row fetcher settings
* timescale#4549 Fix dump_meta_data for windows
* timescale#4553 Fix timescaledb_post_restore GUC handling
* timescale#4573 Load TSL library on compressed_data_out call
* timescale#4575 Fix use of `get_partition_hash` and `get_partition_for_key` inside an IMMUTABLE function
* timescale#4577 Fix segfaults in compression code with corrupt data
* timescale#4580 Handle default privileges on CAggs properly
* timescale#4582 Fix assertion in GRANT .. ON ALL TABLES IN SCHEMA
* timescale#4583 Fix partitioning functions
* timescale#4589 Fix rename for distributed hypertable
* timescale#4601 Reset compression sequence when group resets
* timescale#4611 Fix a potential OOM when loading large data sets into a hypertable
* timescale#4624 Fix heap buffer overflow
* timescale#4627 Fix telemetry initialization
* timescale#4631 Ensure TSL library is loaded on database upgrades
* timescale#4646 Fix time_bucket_ng origin handling
* timescale#4647 Fix the error "SubPlan found with no parent plan" that occurred if using joins in RETURNING clause.

**Thanks**
* @AlmiS for reporting error on `get_partition_hash` executed inside an IMMUTABLE function
* @Creatation for reporting an issue with renaming hypertables
* @janko for reporting an issue when adding bool column with default value to compressed hypertable
* @jayadevanm for reporting error of TRUNCATE TABLE on compressed chunk
* @michaelkitson for reporting permission errors using default privileges on Continuous Aggregates
* @mwahlhuetter for reporting error in joins in RETURNING clause
* @ninjaltd and @mrksngl for reporting a potential OOM when loading large data sets into a hypertable
* @PBudmark for reporting an issue with dump_meta_data.sql on Windows
* @ssmoss for reporting an issue with time_bucket_ng origin handling
@svenklemm svenklemm mentioned this pull request Aug 31, 2022
svenklemm added a commit that referenced this pull request Aug 31, 2022
This release adds major new features since the 2.7.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* time_bucket now supports bucketing by month, year and timezone
* Improve performance of bulk SELECT and COPY for distributed hypertables
* 1 step CAgg policy management
* Migrate Continuous Aggregates to the new format

**Features**
* #4188 Use COPY protocol in row-by-row fetcher
* #4307 Mark partialize_agg as parallel safe
* #4380 Enable chunk exclusion for space dimensions in UPDATE/DELETE
* #4384 Add schedule_interval to policies
* #4390 Faster lookup of chunks by point
* #4393 Support intervals with day component when constifying now()
* #4397 Support intervals with month component when constifying now()
* #4405 Support ON CONFLICT ON CONSTRAINT for hypertables
* #4412 Add telemetry about replication
* #4415 Drop remote data when detaching data node
* #4416 Handle TRUNCATE TABLE on chunks
* #4425 Add parameter check_config to alter_job
* #4430 Create index on Continuous Aggregates
* #4439 Allow ORDER BY on continuous aggregates
* #4443 Add stateful partition mappings
* #4484 Use non-blocking data node connections for COPY
* #4495 Support add_dimension() with existing data
* #4502 Add chunks to baserel cache on chunk exclusion
* #4545 Add hypertable distributed argument and defaults
* #4552 Migrate Continuous Aggregates to the new format
* #4556 Add runtime exclusion for hypertables
* #4561 Change get_git_commit to return full commit hash
* #4563 1 step CAgg policy management
* #4641 Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill
* #4642 Add timezone support to time_bucket

**Bugfixes**
* #4359 Create composite index on segmentby columns
* #4374 Remove constified now() constraints from plan
* #4416 Handle TRUNCATE TABLE on chunks
* #4478 Synchronize chunk cache sizes
* #4486 Adding boolean column with default value doesn't work on compressed table
* #4512 Fix unaligned pointer access
* #4519 Throw better error message on incompatible row fetcher settings
* #4549 Fix dump_meta_data for windows
* #4553 Fix timescaledb_post_restore GUC handling
* #4573 Load TSL library on compressed_data_out call
* #4575 Fix use of `get_partition_hash` and `get_partition_for_key` inside an IMMUTABLE function
* #4577 Fix segfaults in compression code with corrupt data
* #4580 Handle default privileges on CAggs properly
* #4582 Fix assertion in GRANT .. ON ALL TABLES IN SCHEMA
* #4583 Fix partitioning functions
* #4589 Fix rename for distributed hypertable
* #4601 Reset compression sequence when group resets
* #4611 Fix a potential OOM when loading large data sets into a hypertable
* #4624 Fix heap buffer overflow
* #4627 Fix telemetry initialization
* #4631 Ensure TSL library is loaded on database upgrades
* #4646 Fix time_bucket_ng origin handling
* #4647 Fix the error "SubPlan found with no parent plan" that occurred if using joins in RETURNING clause.

**Thanks**
* @AlmiS for reporting error on `get_partition_hash` executed inside an IMMUTABLE function
* @Creatation for reporting an issue with renaming hypertables
* @janko for reporting an issue when adding bool column with default value to compressed hypertable
* @jayadevanm for reporting error of TRUNCATE TABLE on compressed chunk
* @michaelkitson for reporting permission errors using default privileges on Continuous Aggregates
* @mwahlhuetter for reporting error in joins in RETURNING clause
* @ninjaltd and @mrksngl for reporting a potential OOM when loading large data sets into a hypertable
* @PBudmark for reporting an issue with dump_meta_data.sql on Windows
* @ssmoss for reporting an issue with time_bucket_ng origin handling
svenklemm added a commit that referenced this pull request Aug 31, 2022
This release adds major new features since the 2.7.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* time_bucket now supports bucketing by month, year and timezone
* Improve performance of bulk SELECT and COPY for distributed hypertables
* 1 step CAgg policy management
* Migrate Continuous Aggregates to the new format

**Features**
* #4188 Use COPY protocol in row-by-row fetcher
* #4307 Mark partialize_agg as parallel safe
* #4380 Enable chunk exclusion for space dimensions in UPDATE/DELETE
* #4384 Add schedule_interval to policies
* #4390 Faster lookup of chunks by point
* #4393 Support intervals with day component when constifying now()
* #4397 Support intervals with month component when constifying now()
* #4405 Support ON CONFLICT ON CONSTRAINT for hypertables
* #4412 Add telemetry about replication
* #4415 Drop remote data when detaching data node
* #4416 Handle TRUNCATE TABLE on chunks
* #4425 Add parameter check_config to alter_job
* #4430 Create index on Continuous Aggregates
* #4439 Allow ORDER BY on continuous aggregates
* #4443 Add stateful partition mappings
* #4484 Use non-blocking data node connections for COPY
* #4495 Support add_dimension() with existing data
* #4502 Add chunks to baserel cache on chunk exclusion
* #4545 Add hypertable distributed argument and defaults
* #4552 Migrate Continuous Aggregates to the new format
* #4556 Add runtime exclusion for hypertables
* #4561 Change get_git_commit to return full commit hash
* #4563 1 step CAgg policy management
* #4641 Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill
* #4642 Add timezone support to time_bucket

**Bugfixes**
* #4359 Create composite index on segmentby columns
* #4374 Remove constified now() constraints from plan
* #4416 Handle TRUNCATE TABLE on chunks
* #4478 Synchronize chunk cache sizes
* #4486 Adding boolean column with default value doesn't work on compressed table
* #4512 Fix unaligned pointer access
* #4519 Throw better error message on incompatible row fetcher settings
* #4549 Fix dump_meta_data for windows
* #4553 Fix timescaledb_post_restore GUC handling
* #4573 Load TSL library on compressed_data_out call
* #4575 Fix use of `get_partition_hash` and `get_partition_for_key` inside an IMMUTABLE function
* #4577 Fix segfaults in compression code with corrupt data
* #4580 Handle default privileges on CAggs properly
* #4582 Fix assertion in GRANT .. ON ALL TABLES IN SCHEMA
* #4583 Fix partitioning functions
* #4589 Fix rename for distributed hypertable
* #4601 Reset compression sequence when group resets
* #4611 Fix a potential OOM when loading large data sets into a hypertable
* #4624 Fix heap buffer overflow
* #4627 Fix telemetry initialization
* #4631 Ensure TSL library is loaded on database upgrades
* #4646 Fix time_bucket_ng origin handling
* #4647 Fix the error "SubPlan found with no parent plan" that occurred if using joins in RETURNING clause.

**Thanks**
* @AlmiS for reporting error on `get_partition_hash` executed inside an IMMUTABLE function
* @Creatation for reporting an issue with renaming hypertables
* @janko for reporting an issue when adding bool column with default value to compressed hypertable
* @jayadevanm for reporting error of TRUNCATE TABLE on compressed chunk
* @michaelkitson for reporting permission errors using default privileges on Continuous Aggregates
* @mwahlhuetter for reporting error in joins in RETURNING clause
* @ninjaltd and @mrksngl for reporting a potential OOM when loading large data sets into a hypertable
* @PBudmark for reporting an issue with dump_meta_data.sql on Windows
* @ssmoss for reporting an issue with time_bucket_ng origin handling
svenklemm added a commit that referenced this pull request Aug 31, 2022
This release adds major new features since the 2.7.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* time_bucket now supports bucketing by month, year and timezone
* Improve performance of bulk SELECT and COPY for distributed hypertables
* 1 step CAgg policy management
* Migrate Continuous Aggregates to the new format

**Features**
* #4188 Use COPY protocol in row-by-row fetcher
* #4307 Mark partialize_agg as parallel safe
* #4380 Enable chunk exclusion for space dimensions in UPDATE/DELETE
* #4384 Add schedule_interval to policies
* #4390 Faster lookup of chunks by point
* #4393 Support intervals with day component when constifying now()
* #4397 Support intervals with month component when constifying now()
* #4405 Support ON CONFLICT ON CONSTRAINT for hypertables
* #4412 Add telemetry about replication
* #4415 Drop remote data when detaching data node
* #4416 Handle TRUNCATE TABLE on chunks
* #4425 Add parameter check_config to alter_job
* #4430 Create index on Continuous Aggregates
* #4439 Allow ORDER BY on continuous aggregates
* #4443 Add stateful partition mappings
* #4484 Use non-blocking data node connections for COPY
* #4495 Support add_dimension() with existing data
* #4502 Add chunks to baserel cache on chunk exclusion
* #4545 Add hypertable distributed argument and defaults
* #4552 Migrate Continuous Aggregates to the new format
* #4556 Add runtime exclusion for hypertables
* #4561 Change get_git_commit to return full commit hash
* #4563 1 step CAgg policy management
* #4641 Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill
* #4642 Add timezone support to time_bucket

**Bugfixes**
* #4359 Create composite index on segmentby columns
* #4374 Remove constified now() constraints from plan
* #4416 Handle TRUNCATE TABLE on chunks
* #4478 Synchronize chunk cache sizes
* #4486 Adding boolean column with default value doesn't work on compressed table
* #4512 Fix unaligned pointer access
* #4519 Throw better error message on incompatible row fetcher settings
* #4549 Fix dump_meta_data for windows
* #4553 Fix timescaledb_post_restore GUC handling
* #4573 Load TSL library on compressed_data_out call
* #4575 Fix use of `get_partition_hash` and `get_partition_for_key` inside an IMMUTABLE function
* #4577 Fix segfaults in compression code with corrupt data
* #4580 Handle default privileges on CAggs properly
* #4582 Fix assertion in GRANT .. ON ALL TABLES IN SCHEMA
* #4583 Fix partitioning functions
* #4589 Fix rename for distributed hypertable
* #4601 Reset compression sequence when group resets
* #4611 Fix a potential OOM when loading large data sets into a hypertable
* #4624 Fix heap buffer overflow
* #4627 Fix telemetry initialization
* #4631 Ensure TSL library is loaded on database upgrades
* #4646 Fix time_bucket_ng origin handling
* #4647 Fix the error "SubPlan found with no parent plan" that occurred if using joins in RETURNING clause.

**Thanks**
* @AlmiS for reporting error on `get_partition_hash` executed inside an IMMUTABLE function
* @Creatation for reporting an issue with renaming hypertables
* @janko for reporting an issue when adding bool column with default value to compressed hypertable
* @jayadevanm for reporting error of TRUNCATE TABLE on compressed chunk
* @michaelkitson for reporting permission errors using default privileges on Continuous Aggregates
* @mwahlhuetter for reporting error in joins in RETURNING clause
* @ninjaltd and @mrksngl for reporting a potential OOM when loading large data sets into a hypertable
* @PBudmark for reporting an issue with dump_meta_data.sql on Windows
* @ssmoss for reporting an issue with time_bucket_ng origin handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement time_bucket_gapfill_ng()
4 participants