Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create aggregate functions only once to avoid dependency issues #640

Merged
merged 1 commit into from
Aug 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 12 additions & 1 deletion sql/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ set(CATALOG_SOURCE_FILES
tables.sql
)

# Things like aggregate functions cannot be REPLACEd and really
# need to be created just once(like CATALOG_SOURCE_FILES)
# but unlike CATALOG_SOURCE_FILES these have to be loaded
# after everything else is loaded.
set(IMMUTABLE_API_SOURCE_FILES
aggregates.sql
)
# The rest of the source files defining mostly functions
set(SOURCE_FILES
chunk.sql
Expand Down Expand Up @@ -78,6 +85,7 @@ endfunction()
# directory of all our source files
version_files("${PRE_UPDATE_FILES}" PRE_UPDATE_FILES_VERSIONED)
version_files("${CATALOG_SOURCE_FILES}" CATALOG_SOURCE_FILES_VERSIONED)
version_files("${IMMUTABLE_API_SOURCE_FILES}" IMMUTABLE_API_SOURCE_FILES_VERSIONED)
version_files("${SOURCE_FILES}" SOURCE_FILES_VERSIONED)
version_files("${MOD_FILES}" MOD_FILES_VERSIONED)

Expand Down Expand Up @@ -106,7 +114,10 @@ function(cat_files SRC_FILE_LIST OUTPUT_FILE)
endfunction()

# Generate the extension file used with CREATE EXTENSION
cat_files("${CATALOG_SOURCE_FILES_VERSIONED};${SOURCE_FILES_VERSIONED}" ${CMAKE_CURRENT_BINARY_DIR}/${INSTALL_FILE})
cat_files(
"${CATALOG_SOURCE_FILES_VERSIONED};${SOURCE_FILES_VERSIONED};${IMMUTABLE_API_SOURCE_FILES_VERSIONED}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a weird extra return right here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line was getting too long that is why I split this to multiple lines.
My thought was that if we later add more parts to the build pipeline, we can concat the string form the next line

${CMAKE_CURRENT_BINARY_DIR}/${INSTALL_FILE}
)
add_custom_target(sqlfile ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${INSTALL_FILE})

# Generate the update files used with ALTER EXTENSION <name> UPDATE
Expand Down
48 changes: 48 additions & 0 deletions sql/aggregates.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
-- This file is meant to contain aggregate functions that need to be created only
-- once and not recreated during updates.
-- There is no CREATE OR REPLACE AGGREGATE which means that the only way to replace
-- an aggregate is to DROP then CREATE which is problematic as it will fail
-- if the previous version of the aggregate has dependencies.
-- NOTE that WHEN CREATING NEW FUNCTIONS HERE you should also make sure they are
-- created in an update script so that both new users and people updating from a
-- previous version get the new function


--This aggregate returns the "first" element of the first argument when ordered by the second argument.
--Ex. first(temp, time) returns the temp value for the row with the lowest time
CREATE AGGREGATE first(anyelement, "any") (
SFUNC = _timescaledb_internal.first_sfunc,
STYPE = internal,
COMBINEFUNC = _timescaledb_internal.first_combinefunc,
SERIALFUNC = _timescaledb_internal.bookend_serializefunc,
DESERIALFUNC = _timescaledb_internal.bookend_deserializefunc,
PARALLEL = SAFE,
FINALFUNC = _timescaledb_internal.bookend_finalfunc,
FINALFUNC_EXTRA
);

--This aggregate returns the "last" element of the first argument when ordered by the second argument.
--Ex. last(temp, time) returns the temp value for the row with highest time
CREATE AGGREGATE last(anyelement, "any") (
SFUNC = _timescaledb_internal.last_sfunc,
STYPE = internal,
COMBINEFUNC = _timescaledb_internal.last_combinefunc,
SERIALFUNC = _timescaledb_internal.bookend_serializefunc,
DESERIALFUNC = _timescaledb_internal.bookend_deserializefunc,
PARALLEL = SAFE,
FINALFUNC = _timescaledb_internal.bookend_finalfunc,
FINALFUNC_EXTRA
);

-- This aggregate partitions the dataset into a specified number of buckets (nbuckets) ranging
-- from the inputted min to max values.
CREATE AGGREGATE histogram (DOUBLE PRECISION, DOUBLE PRECISION, DOUBLE PRECISION, INTEGER) (
SFUNC = _timescaledb_internal.hist_sfunc,
STYPE = INTERNAL,
COMBINEFUNC = _timescaledb_internal.hist_combinefunc,
SERIALFUNC = _timescaledb_internal.hist_serializefunc,
DESERIALFUNC = _timescaledb_internal.hist_deserializefunc,
PARALLEL = SAFE,
FINALFUNC = _timescaledb_internal.hist_finalfunc,
FINALFUNC_EXTRA
);
28 changes: 0 additions & 28 deletions sql/bookend.sql
Original file line number Diff line number Diff line change
Expand Up @@ -32,31 +32,3 @@ CREATE OR REPLACE FUNCTION _timescaledb_internal.bookend_deserializefunc(bytea,
RETURNS internal
AS '@MODULE_PATHNAME@', 'bookend_deserializefunc'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

--This aggregate returns the "first" element of the first argument when ordered by the second argument.
--Ex. first(temp, time) returns the temp value for the row with the lowest time
DROP AGGREGATE IF EXISTS first(anyelement, "any");
CREATE AGGREGATE first(anyelement, "any") (
SFUNC = _timescaledb_internal.first_sfunc,
STYPE = internal,
COMBINEFUNC = _timescaledb_internal.first_combinefunc,
SERIALFUNC = _timescaledb_internal.bookend_serializefunc,
DESERIALFUNC = _timescaledb_internal.bookend_deserializefunc,
PARALLEL = SAFE,
FINALFUNC = _timescaledb_internal.bookend_finalfunc,
FINALFUNC_EXTRA
);

--This aggregate returns the "last" element of the first argument when ordered by the second argument.
--Ex. last(temp, time) returns the temp value for the row with highest time
DROP AGGREGATE IF EXISTS last(anyelement, "any");
CREATE AGGREGATE last(anyelement, "any") (
SFUNC = _timescaledb_internal.last_sfunc,
STYPE = internal,
COMBINEFUNC = _timescaledb_internal.last_combinefunc,
SERIALFUNC = _timescaledb_internal.bookend_serializefunc,
DESERIALFUNC = _timescaledb_internal.bookend_deserializefunc,
PARALLEL = SAFE,
FINALFUNC = _timescaledb_internal.bookend_finalfunc,
FINALFUNC_EXTRA
);
13 changes: 0 additions & 13 deletions sql/histogram.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,3 @@ CREATE OR REPLACE FUNCTION _timescaledb_internal.hist_finalfunc(state INTERNAL,
RETURNS INTEGER[]
AS '@MODULE_PATHNAME@', 'hist_finalfunc'
LANGUAGE C IMMUTABLE PARALLEL SAFE;

-- Tell Postgres how to use the new function
DROP AGGREGATE IF EXISTS histogram (DOUBLE PRECISION, DOUBLE PRECISION, DOUBLE PRECISION, INTEGER);
CREATE AGGREGATE histogram (DOUBLE PRECISION, DOUBLE PRECISION, DOUBLE PRECISION, INTEGER) (
SFUNC = _timescaledb_internal.hist_sfunc,
STYPE = INTERNAL,
COMBINEFUNC = _timescaledb_internal.hist_combinefunc,
SERIALFUNC = _timescaledb_internal.hist_serializefunc,
DESERIALFUNC = _timescaledb_internal.hist_deserializefunc,
PARALLEL = SAFE,
FINALFUNC = _timescaledb_internal.hist_finalfunc,
FINALFUNC_EXTRA
);
39 changes: 39 additions & 0 deletions sql/updates/0.4.2--0.5.0.sql
Original file line number Diff line number Diff line change
Expand Up @@ -246,3 +246,42 @@ DROP FUNCTION IF EXISTS _timescaledb_internal.drop_trigger_on_all_chunks(INTEGER
DROP FUNCTION IF EXISTS _timescaledb_internal.get_general_trigger_definition(regclass);
DROP FUNCTION IF EXISTS _timescaledb_internal.get_trigger_definition_for_table(INTEGER, text);
DROP FUNCTION IF EXISTS _timescaledb_internal.need_chunk_trigger(int, oid);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the update strategy a bit more? I'm a little confused why you have only changed the 0.4.2-->0.5.0 update script. Say I'm running version 0.8.0 (some arbitrary version in between 0.5.0 and 0.11.0), then these aggregates are already (incorrectly) installed. How am I going to get your fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should have been clearer about that. Maybe I should add this as comment in the commit.

Aggregates are not ever incorrectly installed, at least that is not what this fixes.
Previously, if you had 0.8 installed and had some dependancies on first, last and histogram, then update would fail because update script would try to do
DROP AGGREGATE ... (see current master branch of bookend.sql)
But there is no reason to drop and recreate aggregates every time as to modify their behaviour.Wwe can CREATE AND REPLACE the functions these aggregates refer too.
So if you were to update from 0.8.0 aggregates would not be recreated as they are created once in aggregates.sql when you install timescale for the first time but histogram.sql and bookend.sql would recreate all the functions the aggregate refers. But there is no problem here because histogram.sql and bookend.sql would do CREATE OR REPLACE which does not throw an error in case of dependancies.

I changed that specific update file because that is when histogram was introduced. So if you were to install a version prior to 0.5.0, you would not CREATE AGGREGATE histogram.
but because aggregate.sql is run once in install, changing it would not suffice for an update.
That is why I create the aggregate in the update file.

It is a little ugly, but I also have to copy-paste code to create all the dependency functions. This is because update scripts are run before 0.5.0 changed sql scripts and so if I were to just do CREATE AGGREGATE, the state and transition functions inside the aggregate would not have been created and would not be recognized.

I have not modified any update scripts for first, last because they were introduced in 0.0.4 or so and the first update script is at 0.1.0.

-- Adding this in the update script because aggregates.sql is not rerun in case of an update
CREATE OR REPLACE FUNCTION _timescaledb_internal.hist_sfunc (state INTERNAL, val DOUBLE PRECISION, MIN DOUBLE PRECISION, MAX DOUBLE PRECISION, nbuckets INTEGER)
RETURNS INTERNAL
AS '@MODULE_PATHNAME@', 'hist_sfunc'
LANGUAGE C IMMUTABLE PARALLEL SAFE;

CREATE OR REPLACE FUNCTION _timescaledb_internal.hist_combinefunc(state1 INTERNAL, state2 INTERNAL)
RETURNS INTERNAL
AS '@MODULE_PATHNAME@', 'hist_combinefunc'
LANGUAGE C IMMUTABLE PARALLEL SAFE;

CREATE OR REPLACE FUNCTION _timescaledb_internal.hist_serializefunc(INTERNAL)
RETURNS bytea
AS '@MODULE_PATHNAME@', 'hist_serializefunc'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION _timescaledb_internal.hist_deserializefunc(bytea, INTERNAL)
RETURNS INTERNAL
AS '@MODULE_PATHNAME@', 'hist_deserializefunc'
LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;

CREATE OR REPLACE FUNCTION _timescaledb_internal.hist_finalfunc(state INTERNAL, val DOUBLE PRECISION, MIN DOUBLE PRECISION, MAX DOUBLE PRECISION, nbuckets INTEGER)
RETURNS INTEGER[]
AS '@MODULE_PATHNAME@', 'hist_finalfunc'
LANGUAGE C IMMUTABLE PARALLEL SAFE;

-- This aggregate partitions the dataset into a specified number of buckets (nbuckets) ranging
-- from the inputted min to max values.
CREATE AGGREGATE histogram (DOUBLE PRECISION, DOUBLE PRECISION, DOUBLE PRECISION, INTEGER) (
SFUNC = _timescaledb_internal.hist_sfunc,
STYPE = INTERNAL,
COMBINEFUNC = _timescaledb_internal.hist_combinefunc,
SERIALFUNC = _timescaledb_internal.hist_serializefunc,
DESERIALFUNC = _timescaledb_internal.hist_deserializefunc,
PARALLEL = SAFE,
FINALFUNC = _timescaledb_internal.hist_finalfunc,
FINALFUNC_EXTRA
);
7 changes: 6 additions & 1 deletion sql/updates/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ to consider.
commands that change or remove objects. In some cases,
modifications of metadata are also necessary.
2. `DROP FUNCTION` needs to be idempotent. In most cases that means
commands should have an `IF NOT EXISTS` clause. The reason is that
commands should have an `IF EXISTS` clause. The reason is that
some modfiles might try to, e.g., `DROP` functions that aren't
present because they only exist in an intermediate version of the
database, which is skipped over.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a 5th bullet to this list with the same note that you have in aggregates.sql? Basically something like "If a new aggregate was added in the new version, it should also be created in the modfile leading to that new version"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated and rebased

Expand All @@ -39,6 +39,11 @@ to consider.
4. The creation of new metadata tables need to be part of modfiles,
similar to `ALTER`s of such tables. Otherwise, later modfiles
cannot rely on those tables being present.
5. When creating a new aggregate, the `CREATE` statement should be
added to both aggregate.sql AND an update file. aggregate.sql is
run once when TimescaleDB is installed so adding a definition in
an update file is the only way to ensure that upgrading users get
the new function.

Note that modfiles that contain no changes need not exist as a
file. Transition modfiles must, however, be listed in the
Expand Down