Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor continuous aggregate code #5683

Merged
merged 1 commit into from
May 25, 2023

Conversation

RafiaSabih
Copy link
Contributor

@RafiaSabih RafiaSabih commented May 12, 2023

Spilt create.c file to more logical files.

Disable-check: force-changelog-changed

Co-authored by: Fabrízio de Royes Mello fabriziomello@gmail.com

@codecov
Copy link

codecov bot commented May 12, 2023

Codecov Report

Merging #5683 (590e57e) into main (29154b2) will increase coverage by 0.01%.
The diff coverage is 92.39%.

@@            Coverage Diff             @@
##             main    #5683      +/-   ##
==========================================
+ Coverage   87.66%   87.67%   +0.01%     
==========================================
  Files         231      234       +3     
  Lines       54803    54811       +8     
  Branches    12055    12052       -3     
==========================================
+ Hits        48041    48058      +17     
+ Misses       4926     4914      -12     
- Partials     1836     1839       +3     
Impacted Files Coverage Δ
tsl/src/continuous_aggs/create.c 90.58% <ø> (-1.07%) ⬇️
tsl/src/continuous_aggs/insert.c 87.02% <ø> (ø)
tsl/src/continuous_aggs/invalidation.c 94.10% <ø> (ø)
tsl/src/continuous_aggs/invalidation_threshold.c 76.92% <ø> (ø)
tsl/src/continuous_aggs/refresh.c 96.05% <ø> (ø)
tsl/src/init.c 96.29% <ø> (ø)
tsl/src/continuous_aggs/repair.c 85.57% <85.57%> (ø)
tsl/src/continuous_aggs/common.c 90.73% <90.73%> (ø)
tsl/src/continuous_aggs/finalize.c 95.93% <95.93%> (ø)
tsl/src/continuous_aggs/materialize.c 76.88% <100.00%> (+6.94%) ⬆️
... and 1 more

... and 27 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@fabriziomello fabriziomello added continuous_aggregate tech-debt Needs refactoring and improvement tasks related to the source code and its architecture. Team: Core Database labels May 12, 2023
Comment on lines 118 to 178
## Distribution of functions across files

create.c
This file contains the functions that are directly responsible for the creation of the continuous aggregates,
like creating hypertable, catalog_entry, view, etc.

cagg_create
create_cagg_catalog_entry
create_bucket_function_catalog_entry
cagg_create_hypertable
create_view_for_query
tsl_process_continuous_agg_viewstmt
check_trigger_exists_hypertable
cagg_add_trigger_hypertable
fixup_userview_query_tlist
mattablecolumninfo_add_mattable_index
mattablecolumninfo_create_materialization_table
mattablecolumninfo_get_partial_select_query
cagg_flip_realtime_view_definition
cagg_rename_view_columns

common.c
This file contains the functions common in all scenarios of creating a continuous aggregates.

check_time_bucket_argument
caggtimebucketinfo_init
caggtimebucket_validate
get_bucket_width
cagg_get_boundary_converter_funcoid
build_conversion_call
build_boundary_call
cagg_boundary_make_lower_bound
function_allowed_in_cagg_definition
mattablecolumninfo_addentry
cagg_agg_validate
cagg_query_supported
cagg_validate_query
remove_old_and_new_rte_from_query
destroy_union_query
build_union_query_quals
make_subquery_rte
build_union_query

finalize.c
This file contains the specific functions for the case when continous aggregates are created in old format.

get_input_types_array_datum
set_var_mapping
var_already_mapped
add_var_mutator
create_replace_having_qual_mutator
add_aggregate_partialize_mutator
get_finalize_function_oid
get_finalize_aggref
add_partialize_column
finalizequery_create_havingqual
finalizequery_init
finalizequery_get_select_query

materialize.c
This file contains the functions directly dealing with the materialization of the continuous aggregates.

spi_update_materializations
spi_delete_materializations
spi_insert_materializations
continuous_agg_update_materialization
mattablecolumninfo_init
mattablecolumninfo_addinternal
ranges_overlap
range_length
time_range_internal_to_min_time_value
time_range_internal_to_max_time_value
internal_to_time_value_or_infinite
internal_time_range_to_time_range

repair.c
The repair and rebuilding related functions are put together in this file

cagg_rebuild_view_definition
tsl_cagg_try_repair
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to mention all internal functions here in this README file. IMHO explaining the goal of each C file is enough (please put them in alphabetical order).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's excellent that there is documentation in the README, thank you for thinking of this.

Comment on lines 11 to 19
#define PRINT_MATCOLNAME(colbuf, type, original_query_resno, colno) \
do \
{ \
int ret = snprintf(colbuf, NAMEDATALEN, "%s_%d_%d", type, original_query_resno, colno); \
if (ret < 0 || ret >= NAMEDATALEN) \
ereport(ERROR, \
(errcode(ERRCODE_INTERNAL_ERROR), \
errmsg("bad materialization table column name"))); \
} while (0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about transform those macro functions into static inline?? also changing this horrible name to something like makeMaterializeColumnName

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question, I wonder why this was defined as a macro in the first place if you might know?

@RafiaSabih RafiaSabih force-pushed the cagg_refactor branch 3 times, most recently from 2553282 to b2da3da Compare May 17, 2023 17:22
#include "ts_catalog/catalog.h"
#include "ts_catalog/continuous_agg.h"

#define PARTIALFN "partialize_agg"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super minor, please ignore if it's annoying, but since you're already refactoring the code maybe we can name this TS_PARTIALFN instead because now we have two different defs for the same function, once as PARTIALFN and once as TS_PARTIALFN


typedef struct AggPartCxt
static inline void
makeMaterializedTableName(char *buf, char *prefix, int hypertable_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
makeMaterializedTableName(char *buf, char *prefix, int hypertable_id)
makeMaterializedTableName(char *buf, const char *prefix, int hypertable_id)

Comment on lines +109 to +123
#define CAGG_MAKEQUERY(selquery, srcquery) \
do \
{ \
(selquery) = makeNode(Query); \
(selquery)->commandType = CMD_SELECT; \
(selquery)->querySource = (srcquery)->querySource; \
(selquery)->queryId = (srcquery)->queryId; \
(selquery)->canSetTag = (srcquery)->canSetTag; \
(selquery)->utilityStmt = copyObject((srcquery)->utilityStmt); \
(selquery)->resultRelation = 0; \
(selquery)->hasAggs = true; \
(selquery)->hasRowSecurity = false; \
(selquery)->rtable = NULL; \
} while (0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transform it into a static inline function like makeContinuousAggregateQuery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am keeping it as is for the moment, because it is being accessed in multiple files.

Comment on lines 124 to 125
#define CHUNKIDFROMRELID "chunk_id_from_relid"
#define CONTINUOUS_AGG_CHUNK_ID_COL_NAME "chunk_id"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are used just inside materialize.c so can be moved to there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And CONTINUOUS_AGG_CHUNK_ID_COL_NAME is also defined on create.h so need to remove from there

Comment on lines 127 to 129
#define MATPARTCOL_INTERVAL_FACTOR 10
#define HT_DEFAULT_CHUNKFN "calculate_chunk_interval"
#define CAGG_INVALIDATION_TRIGGER "continuous_agg_invalidation_trigger"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems those macros are used just inside create.c so we can move to there.

extern Var *mattablecolumninfo_addentry(MatTableColumnInfo *out, Node *input,
int original_query_resno, bool finalized,
bool *skip_adding);
extern Oid relation_oid(NameData schema, NameData name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we can generalize this one and move to utils.h and change all the places that do a combination of get_relname_relid, but this can be material for a following PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent a small PR to refactor it: #5716

Comment on lines 44 to 45
void mattablecolumninfo_init(MatTableColumnInfo *matcolinfo, List *grouplist);
void mattablecolumninfo_addinternal(MatTableColumnInfo *matcolinfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't both functions be part of common.h and common.c because is where MatTableColumnInfo struct is defined.

@RafiaSabih RafiaSabih force-pushed the cagg_refactor branch 2 times, most recently from 5e773be to 85746a3 Compare May 21, 2023 08:36
@@ -23,6 +23,11 @@
#include "hypertable_cache.h"
#include "scan_iterator.h"

static void update_materialized_only(ContinuousAgg *agg, bool materialized_only);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static void update_materialized_only(ContinuousAgg *agg, bool materialized_only);
static void cagg_update_materialized_only(ContinuousAgg *agg, bool materialized_only);

* Check if the supplied OID belongs to a valid bucket function
* for continuous aggregates.
*/
bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool
static bool

Also add it to the beginning of this file and remove from common.h

* the user or direct view query.
*/
void
remove_old_and_new_rte_from_query(Query *query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
remove_old_and_new_rte_from_query(Query *query)
RemoveRangeTableEntries(Query *query)

Nit.

}

static RangeTblEntry *
make_subquery_rte(Query *subquery, const char *aliasname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
make_subquery_rte(Query *subquery, const char *aliasname)
makeRangeTblEntry(Query *query, const char *aliasname)

Nit.

Comment on lines 133 to 135
extern Var *mattablecolumninfo_addentry(MatTableColumnInfo *out, Node *input,
int original_query_resno, bool finalized,
bool *skip_adding);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be moved to finalize.c because it is the only place it is used

Comment on lines 111 to 120
(selquery) = makeNode(Query);
(selquery)->commandType = CMD_SELECT;
(selquery)->querySource = (srcquery)->querySource;
(selquery)->queryId = (srcquery)->queryId;
(selquery)->canSetTag = (srcquery)->canSetTag;
(selquery)->utilityStmt = copyObject((srcquery)->utilityStmt);
(selquery)->resultRelation = 0;
(selquery)->hasAggs = true;
(selquery)->hasRowSecurity = false;
(selquery)->rtable = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(selquery) = makeNode(Query);
(selquery)->commandType = CMD_SELECT;
(selquery)->querySource = (srcquery)->querySource;
(selquery)->queryId = (srcquery)->queryId;
(selquery)->canSetTag = (srcquery)->canSetTag;
(selquery)->utilityStmt = copyObject((srcquery)->utilityStmt);
(selquery)->resultRelation = 0;
(selquery)->hasAggs = true;
(selquery)->hasRowSecurity = false;
(selquery)->rtable = NULL;
selquery = makeNode(Query);
selquery->commandType = CMD_SELECT;
selquery->querySource = srcquery->querySource;
selquery->queryId = srcquery->queryId;
selquery->canSetTag = srcquery->canSetTag;
selquery->utilityStmt = copyObject(srcquery->utilityStmt);
selquery->resultRelation = 0;
selquery->hasAggs = true;
selquery->hasRowSecurity = false;
selquery->rtable = NULL;

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have started to review this, but it would be good if you could have a more elaborate text about what modules you have broken up the code into. It will help with understanding the purpose of each module and create good cohesion within the modules.

@RafiaSabih RafiaSabih force-pushed the cagg_refactor branch 2 times, most recently from fbf7d17 to 16d4541 Compare May 24, 2023 06:08
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that you have copied the code and not made drastic changes to it, so I haven't reviewed the code in detail. If that is not the case, please tell me.

} AggPartCxt;

/* Note that we set rowsecurity to false here. */
static inline makeContinuousAggregateQuery(Query *selquery, Query *srcquery)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static inline makeContinuousAggregateQuery(Query *selquery, Query *srcquery)
static inline void makeContinuousAggregateQuery(Query *selquery, Query *srcquery)

Comment on lines 111 to 112
(selquery) = makeNode(Query);
(selquery)->commandType = CMD_SELECT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work, the function is effectively a no-op. The variable selquery is local to the function so assigning a value to it will not be visible outside the function. You either need to change the type of the parameter, return a new pointer to a value, or turn this into a macro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you write a header file for a module, try to limit the file to only contain what you are prepared to make part of the interface to the module. See some example comments below about definitions that look more like implementation details and is probably not something you want to export outside the module.

Comment on lines 124 to 126
#define INTERNAL_TO_DATE_FUNCTION "to_date"
#define INTERNAL_TO_TSTZ_FUNCTION "to_timestamp"
#define INTERNAL_TO_TS_FUNCTION "to_timestamp_without_timezone"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These only seem to be relevant for the implementation in common.c, so maybe move them there. They are also still present in create.c but not used there, so you might want to remove them.

(selquery)->rtable = NULL;
}

#define BOUNDARY_FUNCTION "cagg_watermark"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one exists in create.c but is not used there, and also looks like it belongs inside common.c, not as part of the module interface.

Comment on lines 138 to 150
cagg_add_trigger_hypertable
cagg_create
cagg_create_hypertable
cagg_flip_realtime_view_definition
cagg_rename_view_column
create_bucket_function_catalog_entry
create_cagg_catalog_entry
create_view_for_query
fixup_userview_query_tlist
mattablecolumninfo_add_mattable_index
mattablecolumninfo_create_materialization_table
mattablecolumninfo_get_partial_select_query
tsl_process_continuous_agg_viewstmt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a markdown file you probably need to format these better. Also, not sure you actually need to list the functions in the module since that will change over time and there is no way we will be able to keep it up to date. I would suggest that you just state the purpose of each module (header file + source fine with same base name) and stop there.

Comment on lines 84 to 93
Oid
relation_oid(NameData schema, NameData name)
{
return get_relname_relid(NameStr(name), get_namespace_oid(NameStr(schema), false));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type NameData is a struct, and even though it is small, it does not seem necessary to copy it with each function call.

Suggested change
Oid
relation_oid(NameData schema, NameData name)
{
return get_relname_relid(NameStr(name), get_namespace_oid(NameStr(schema), false));
}
Oid
relation_oid(Name schema, Name name)
{
return get_relname_relid(NameStr(*name), get_namespace_oid(NameStr(*schema), false));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still stands, I think it is better to use Name.

return true; /* Query was OK and is supported. */
}

static inline Datum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: compilers usually ignore inline.

Suggested change
static inline Datum
static Datum

return width;
}

static inline int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static inline int64
static int64

@RafiaSabih RafiaSabih force-pushed the cagg_refactor branch 3 times, most recently from 59ab7c1 to 1f40bdf Compare May 25, 2023 10:24
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing that I think you need to fix. Otherwise looks good.

Comment on lines 84 to 93
Oid
relation_oid(NameData schema, NameData name)
{
return get_relname_relid(NameStr(name), get_namespace_oid(NameStr(schema), false));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still stands, I think it is better to use Name.

@RafiaSabih RafiaSabih enabled auto-merge (rebase) May 25, 2023 13:14
@RafiaSabih RafiaSabih merged commit 70d0704 into timescale:main May 25, 2023
56 of 57 checks passed
@RafiaSabih RafiaSabih deleted the cagg_refactor branch May 25, 2023 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
continuous_aggregate Team: Core Database tech-debt Needs refactoring and improvement tasks related to the source code and its architecture.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants