Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune old events from sqlite database #4737

Closed
wants to merge 2 commits into from

Conversation

azeemba
Copy link

@azeemba azeemba commented May 17, 2023

Adds a new config dbHistoryDays which defaults to undefined.

At start up, each database handler reads the config. If the value is set, then a reoccuring task is scheduled to clean up old events. Events older than dbHistoryDays are targeted but only a few thousand events are cleaned up per iteration to avoid freezing the server.

Adds unit tests to validate the cleanup logic as well.

Fixes #2822

Adds a new config `dbHistoryDays` which defaults to undefined.

At start up, each database handler reads the config. If the value is
set, then a reoccuring task is scheduled to clean up old events.
Events older than `dbHistoryDays` are targeted but only a few thousand
events are cleaned up per iteration to avoid freezing the server.

Adds unit tests to validate the cleanup logic as well.

Fixes thelounge#2822
Copy link
Member

@brunnre8 brunnre8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking this on.
Now, I've some general comments as to how I think this could work, plus a few minor comments in the code directly.

I think this is a bit too limited as is.
For starters, I want the history retention to be configurable by the user, not just the admin.
While in many cases that's the same, it isn't in all cases and not having to edit the server config for something like that is better.

What that means is that we need this to be a hierarchical configuration. Admins sets global limits as they see fit, user can lower that limit as they feel like.

We should also put that configuration into its own top level config object so that we can add more related settings in the future, for instance I would like to add the ability to only delete unimportant status messages (leave, part, nick change, away, back etc) while retaining the actual useful stuff indefinitely (privmsg, notice).

Does this make sense?

Also note, you should probably rebase on top of #4718 as I'll merge that soon (after a bugfix release for 4.4.0)

// Undefined/-1/0 is treated an unlimited.
// The limit is seen as a soft target but not an exact goal. Only a few
// thousand rows are pruned at a time to avoid slowing down the service.
dbHistoryDays: undefined,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be moved into its own top level object, else we end up with a bunch of similar names which are all very long for no real reason.

const deleteBefore = Date.now() - keepNdays * millisecondsInDay;
await this.serialize_run(
`DELETE FROM messages WHERE rowid in (
SELECT rowid FROM messages WHERE time < ? ORDER BY time ASC LIMIT ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you enforcing an ordering here?
While I get the intention behind it, usually the messages in question are not visible to the client.

You could just let the DB do it in whatever order it sees fit, generally you'd assume that you can keep up within a single invocation right?
We have an index on the time, so it's probably relatively easy to order, but still.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do delete in batches of only 1000 events, I felt wrong creating a "hole" in history. Since the index is already there, I think there is no harm in doing it this way right?

@fnutt
Copy link
Contributor

fnutt commented May 19, 2023

I think this is a bit too limited as is. For starters, I want the history retention to be configurable by the user, not just the admin. While in many cases that's the same, it isn't in all cases and not having to edit the server config for something like that is better.

What that means is that we need this to be a hierarchical configuration. Admins sets global limits as they see fit, user can lower that limit as they feel like.

I'm not sure why a user would set it lower than the admin global limit. What is the usecase for this? Would any users want less history to search/scroll? If they want to remove history, they can do this per channel already?

@brunnre8
Copy link
Member

brunnre8 commented May 19, 2023

Admin says, yeah well, storage isn't an issue.
User says, out of privacy issues I'm uncomfortable to have history there for longer than a month.

Easy enough to imagine, no?

Similarly, User A is unlimited as that's my main account, User B is a test account for dev purposes and I just want some history, without actually using much space

Copy link
Member

@brunnre8 brunnre8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.
In my opinion this should still be restructured a bit though, tell me if you disagree with my comments and I try to explain my reasoning a bit better.

this.mergedConfig = this.mergeClientConfig(clientStorageConfig);
}

mergeClientConfig(clientStorageConfig: SqliteConfig) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the right place for that kind of logic.
The config needs to be merged by the core code, not the sqlite specific storage backend.

Yes, currently sqlite is our only implementation of the interface, but that doesn't mean that we should violate the layers.

maxDaysHistory: number | undefined;
}
| undefined;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that type seems odd to me?

You shouldn't set that to an object or undefined, things that want it optional shall say so when they mean it.

Meaning, the user of the type should do

export type ConfigType = {
 // ...
sqliteConfig?: SqliteConfig;
// ...
}

Or whatever, but not that funny type that is intrinsically this way.
That's a hassle to use

//
// `maxDaysHistory`:
// Defines the maximum number of days of history to keep in the database.
// Undefined/-1/0 is treated an unlimited.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly undefined
We should have one way and only one way of configuring something.

@@ -37,7 +37,7 @@ describe("SQLite Message Storage (stateful tests)", function () {
// Delete database file from previous test run
await cleanup();

store = new MessageStorage("testUser");
store = new MessageStorage("testUser", undefined as any);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such casts are very brittle... if you can avoid so please do and in this case you trivially could by providing a correct default value

@azeemba
Copy link
Author

azeemba commented Jul 16, 2023

I appreciate the feedback but this has slipped in priority for me. Happy for someone to take over the PR or to close the PR.

@brunnre8 brunnre8 added the Status: needs-new-owner PR abandoned by its original owner and up for grabs label Jul 16, 2023
@brunnre8
Copy link
Member

thanks for your efforts

@brunnre8 brunnre8 closed this Nov 4, 2023
@azeemba azeemba deleted the pruning branch November 4, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: needs-new-owner PR abandoned by its original owner and up for grabs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logging - limit disk usage of logs/<username>.sqlite files, remove old records
3 participants