feat: Add new DB column `messageHash` #2202

ABresting · 2023-11-08T22:09:14Z

Description

DB column messageHash added in both SQLite and Postgres. More about this PR can be found at this research issue

Changes

messageHash column added in SQLite
messageHash column added in Postgres
DB migration script added

Issue

#2112
closes #2229

github-actions · 2023-11-08T22:09:26Z

This PR may contain changes to database schema of one of the drivers.

If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues.

Please make sure the label release-notes is added to make sure upgrade instructions properly highlight this change.

github-actions · 2023-11-08T23:49:04Z

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2202

Built from 104a717

SionoiS

Thanks!

tests/waku_store/test_resume.nim

SionoiS · 2023-11-09T14:16:11Z

migrations/message_store/00008_updatePrimaryKey_add_col.up.sql

Kinda understand what this does but SQL not my forte.

jm-clius

LGTM in general. Have left some comments on selecting an appropriate primary key and whether we need to maintain the previous index. Since DB migration is a delicate process, we should test the migration from previous version (or two versions?) of nwaku with populated DB to this version.

jm-clius · 2023-11-09T14:57:37Z

waku/waku_archive/driver/postgres_driver/postgres_driver.nim

  " storedAt BIGINT NOT NULL," &
-  " CONSTRAINT messageIndex PRIMARY KEY (storedAt, id, pubsubTopic)" &
+  " CONSTRAINT messageIndex PRIMARY KEY (storedAt, messageHash)" &


Why do we need both storedAt and messageHash as PRIMARY KEY? Since the messageHash is unique and not null, shouldn't that be enough?
We may still need an index (not primary key) on (storedAt, id, pubsubTopic) for query performance, though, since we still query these attributes. @Ivansete-status may have better idea.

First half of your question:
short answer: storedAt is required otherwise, the integrity of the DB will be broken.
long answer: https://github.com/waku-org/nwaku/blob/master/tests/testlib/wakucore.nim#L54 Payload and meta are optional here, if we remove storedAt from primary key, then some test cases will fail. i.e. any app sending the data might keep the same data (payload, meta, content and pubsub topic, all those required for messageHash computation)

I agree with your statement @jm-clius. Better have the messageHash as the primary KEY.

jm-clius · 2023-11-09T14:58:50Z

waku/waku_archive/driver/sqlite_driver/queries.nim

  " storedAt INTEGER NOT NULL," &
-  " CONSTRAINT messageIndex PRIMARY KEY (storedAt, id, pubsubTopic)" &
+  " CONSTRAINT messageIndex PRIMARY KEY (storedAt, messageHash)" &


Same question as for postgres - unsure why we'd need both storedAt and messageHash for primary key.

jm-clius · 2023-11-09T15:04:32Z

migrations/message_store/00008_updatePrimaryKey_add_col.up.sql

+  id BLOB,
+  messageHash BLOB, -- Newly added, this will be populated with a counter value
+  storedAt INTEGER NOT NULL,
+  CONSTRAINT messageIndex PRIMARY KEY (storedAt, messageHash)


Have noted elsewhere, but:

I would imagine we only need messageHash as the primary key?

Not sure if we need to maintain the previous index on (storedAt, id, pubsubTopic) for query performance?

Answered elsewhere

for query performance input from @Ivansete-status ?

ah yes! You are right @jm-clius, the messageHash will be the primary key

ah yes! You are right @jm-clius, the messageHash will be the primary key

in that case, I will open a new PR since in this we also need to depend on amended digest (which I have already made locally in a separate PR against issue #2215 ) so it is better to open a new PR for the removal of storedAt from the primary key

ABresting · 2023-11-10T23:44:42Z

LGTM in general. Have left some comments on selecting an appropriate primary key and whether we need to maintain the previous index. Since DB migration is a delicate process, we should test the migration from previous version (or two versions?) of nwaku with populated DB to this version.

I have tested the migration on tag versions 15 and 20, it works out well on both of them without any problem.
Also tested the new PR image against the jsWaku test suite and it was a success!

Ivansete-status

Thanks for this step further @ABresting !
Overall it looks good. Just a little detail that I don't quite understand from the upgrade script in SQLite

Ivansete-status · 2023-11-16T14:56:39Z

migrations/message_store/00008_updatePrimaryKey_add_col.up.sql

+  id BLOB,
+  messageHash BLOB, -- Newly added, this will be populated with a counter value
+  storedAt INTEGER NOT NULL,
+  CONSTRAINT messageIndex PRIMARY KEY (storedAt, messageHash)


ah yes! You are right @jm-clius, the messageHash will be the primary key

Ivansete-status · 2023-11-16T14:59:48Z

migrations/message_store/00008_updatePrimaryKey_add_col.up.sql

+  (
+    SELECT COUNT(*)
+    FROM message_backup AS mb2
+    WHERE mb2.storedAt <= mb.storedAt
+  ) as messageHash, -- to populate the counter values


Sorry but I don't quite understand this snippet. That doesn't sound like a message hash.

so basically here in the case of migration, we can not leave the newly added messageHash as Null or empty for backdated messages/DB-rows, so using this script we are filling messageHash using a counter variable. For eg. If there were 50 messages before running the migration scripts, then after migration messageHash will be introduced so for the previous 50 messages in the DB we fill the messageHash value for those as a counter from 1 to 50. This way unique value will be there in messageHash column. WDYT @Ivansete-status

I wonder if this would take long in a DB with large number of rows like those from status.

Perhaps this https://www.sqlite.org/lang_corefunc.html#randomblob could be a good alternative to doing a count for each row inserted, or perhaps using https://www.sqlite.org/c3ref/create_function.html and defining a function in nim to calculate the message hash that can be called from sqlite. (this is for sure much more complicated to implement)

I wonder if this would take long in a DB with large number of rows like those from status.

Perhaps this https://www.sqlite.org/lang_corefunc.html#randomblob could be a good alternative to doing a count for each row inserted, or perhaps using https://www.sqlite.org/c3ref/create_function.html and defining a function in nim to calculate the message hash that can be called from sqlite. (this is for sure much more complicated to implement)

I think the current solution is capable of scaling, and it is a simpler approach to the issue here, since post-migration, there will be long strings of hashes in messageHash column, so basically backfilling DB with counter values will be unique and less resource-consuming in comparison?
WDYT @jm-clius @richard-ramos @Ivansete-status

I wonder if this would take long in a DB with large number of rows like those from status.
Perhaps this https://www.sqlite.org/lang_corefunc.html#randomblob could be a good alternative to doing a count for each row inserted, or perhaps using https://www.sqlite.org/c3ref/create_function.html and defining a function in nim to calculate the message hash that can be called from sqlite. (this is for sure much more complicated to implement)

I think the current solution is capable of scaling, and it is a simpler approach to the issue here, since post-migration, there will be long strings of hashes in messageHash column, so basically backfilling DB with counter values will be unique and less resource-consuming in comparison? WDYT @jm-clius @richard-ramos @Ivansete-status

Actually, @richard-ramos is right, the count(*) operation will be more intensive than randomBlob(N), so it will be nice to make this change. But make sure that this function is available in clients older this SQLite version 3.3.13.

Also as recommended by @jm-clius, a counter-mechanism i.e. using simple values 1,2,3... will not be the case then.

considering that, SQLite 3.3.13 was launched in 2007, we should proceed with randomBlob(N) without version check?

No, it should be possible to check the SQLite version in the ABI used by the version of nwaku currently deployed to the fleets.

Seems like 3.40, but probably worth checking history here for any recent upgrades. https://raw.githubusercontent.com/arnetheduck/nim-sqlite3-abi/362e1bd9f689ad9f5380d9d27f0705b3d4dfc7d3/sqlite3_abi/sqlite3.c

Would it be possible to just set this attribute to "empty" or nil ?
It doesn't make sense from my point of view to add this complexity because, in the end, we are not calculating the messageHash in the same way as the code base is doing. Or am I missing something?

No, it should be possible to check the SQLite version in the ABI used by the version of nwaku currently deployed to the fleets.

One question: the randomBlob() is there to begin with from tag v0.1 so it is safer to assume that we can use randomBlob that even on the earliest clients it is available and will not break any backward/outdated clients?

Ivansete-status · 2023-11-16T15:03:20Z

waku/waku_archive/driver/postgres_driver/postgres_driver.nim

  " storedAt BIGINT NOT NULL," &
-  " CONSTRAINT messageIndex PRIMARY KEY (storedAt, id, pubsubTopic)" &
+  " CONSTRAINT messageIndex PRIMARY KEY (storedAt, messageHash)" &


I agree with your statement @jm-clius. Better have the messageHash as the primary KEY.

jm-clius

Thanks for your patience in this. I think we would need to amend the deterministic message hashing algorithm (#2215) before merging this (and rebase it on that change).
This change is meant to introduce a new unique index column, called messageHash. If we have determined that messageHash is not unique, we first need to fix the underlying logic, otherwise we'll have to migrate existing databases twice (to update the primary key from (storedAt, messageHash) to just messageHash). Afaict the fix for #2215 should be a small change in a single file?

ABresting · 2023-11-21T21:07:23Z

Thanks for your patience in this. I think we would need to amend the deterministic message hashing algorithm (#2215) before merging this (and rebase it on that change). This change is meant to introduce a new unique index column, called messageHash. If we have determined that messageHash is not unique, we first need to fix the underlying logic, otherwise we'll have to migrate existing databases twice (to update the primary key from (storedAt, messageHash) to just messageHash). Afaict the fix for #2215 should be a small change in a single file?

Thanks for the input @jm-clius, I have launched a PR fix for #2215, upon it's merge I will quickly rebase to that change.

jm-clius · 2023-11-22T14:48:21Z

Seeing you've re-requested a review, but afaict the primary key has not been updated to reflect the latest changes in master?

ABresting · 2023-11-22T14:54:09Z

Seeing you've re-requested a review, but afaict the primary key has not been updated to reflect the latest changes in master?

I was planning to remove the Primary key in the follow-up PR since we follow modular changes practice, but let me club it inside this PR then. Ah but yeah in production it's better to do one time migration than two.

jm-clius · 2023-11-22T15:25:31Z

Indeed, we do incremental PRs, but in this case this would necessitate unnecessary migrations.

jm-clius

Thanks!

ABresting added 2 commits November 8, 2023 22:58

feat: added DB column messageHash

501bfd1

feat: minor change

9fe9ee8

ABresting added the release-notes Issue/PR needs to be evaluated for inclusion in release notes highlights or upgrade instructions label Nov 8, 2023

ABresting requested review from SionoiS, alrevuelta, jm-clius and Ivansete-status November 8, 2023 22:09

ABresting self-assigned this Nov 8, 2023

ABresting mentioned this pull request Nov 8, 2023

feat: messageHash attaribute added in SQLite + migration script ready #2159

Closed

5 tasks

ABresting requested review from richard-ramos and vpavlin November 8, 2023 22:17

feat: minor merge conflict fix

40707ec

SionoiS approved these changes Nov 9, 2023

View reviewed changes

jm-clius reviewed Nov 9, 2023

View reviewed changes

ABresting added 3 commits November 13, 2023 14:53

Update test_resume.nim

0850d12

Update test_resume.nim

df1bd63

Merge branch 'master' into add-new-db-column

9dc1be2

Ivansete-status reviewed Nov 16, 2023

View reviewed changes

ABresting mentioned this pull request Nov 18, 2023

chore: remove storedAt from PRIMARY key in DB #2229

Closed

2 tasks

ABresting requested a review from jm-clius November 20, 2023 11:30

jm-clius reviewed Nov 21, 2023

View reviewed changes

Merge branch 'master' into add-new-db-column

33e7924

ABresting requested a review from jm-clius November 22, 2023 14:35

ABresting removed the request for review from jm-clius November 22, 2023 15:18

randomblob() func used to populate attribute

2f3473d

PRIMARY key updated - SQLite and Postgres

3b1aa4d

ABresting requested review from Ivansete-status, SionoiS, richard-ramos and jm-clius and removed request for richard-ramos November 22, 2023 15:28

jm-clius approved these changes Nov 22, 2023

View reviewed changes

ABresting merged commit aeb77a3 into master Nov 22, 2023
9 of 10 checks passed

ABresting deleted the add-new-db-column branch November 22, 2023 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add new DB column `messageHash` #2202

feat: Add new DB column `messageHash` #2202

ABresting commented Nov 8, 2023 •

edited

github-actions bot commented Nov 8, 2023

github-actions bot commented Nov 8, 2023 •

edited

SionoiS left a comment

SionoiS Nov 9, 2023

jm-clius left a comment

jm-clius Nov 9, 2023

ABresting Nov 10, 2023

Ivansete-status Nov 16, 2023 •

edited

jm-clius Nov 9, 2023

jm-clius Nov 9, 2023

ABresting Nov 10, 2023

Ivansete-status Nov 16, 2023

ABresting Nov 18, 2023

ABresting commented Nov 10, 2023

Ivansete-status left a comment

Ivansete-status Nov 16, 2023

Ivansete-status Nov 16, 2023

ABresting Nov 17, 2023

richard-ramos Nov 17, 2023 •

edited

ABresting Nov 17, 2023

ABresting Nov 17, 2023

ABresting Nov 17, 2023

jm-clius Nov 17, 2023

jm-clius Nov 17, 2023

Ivansete-status Nov 17, 2023

ABresting Nov 19, 2023

Ivansete-status Nov 16, 2023 •

edited

jm-clius left a comment

ABresting commented Nov 21, 2023

jm-clius commented Nov 22, 2023

ABresting commented Nov 22, 2023 •

edited

jm-clius commented Nov 22, 2023

jm-clius left a comment

feat: Add new DB column messageHash #2202

feat: Add new DB column messageHash #2202

Conversation

ABresting commented Nov 8, 2023 • edited

Description

Changes

Issue

github-actions bot commented Nov 8, 2023

github-actions bot commented Nov 8, 2023 • edited

SionoiS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jm-clius left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivansete-status Nov 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ABresting commented Nov 10, 2023

Ivansete-status left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richard-ramos Nov 17, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ivansete-status Nov 16, 2023 • edited

Choose a reason for hiding this comment

jm-clius left a comment

Choose a reason for hiding this comment

ABresting commented Nov 21, 2023

jm-clius commented Nov 22, 2023

ABresting commented Nov 22, 2023 • edited

jm-clius commented Nov 22, 2023

jm-clius left a comment

Choose a reason for hiding this comment

feat: Add new DB column `messageHash` #2202

feat: Add new DB column `messageHash` #2202

ABresting commented Nov 8, 2023 •

edited

github-actions bot commented Nov 8, 2023 •

edited

Ivansete-status Nov 16, 2023 •

edited

richard-ramos Nov 17, 2023 •

edited

Ivansete-status Nov 16, 2023 •

edited

ABresting commented Nov 22, 2023 •

edited