Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sqlite cleanup functionality #4799

Merged
merged 7 commits into from Dec 26, 2023
Merged

Add sqlite cleanup functionality #4799

merged 7 commits into from Dec 26, 2023

Conversation

brunnre8
Copy link
Member

@brunnre8 brunnre8 commented Nov 4, 2023

Once this is getting hooked up, it'll periodically delete old
messages.

The StoragePolicy can be chosen by the user, currently there's
two versions, delete everything based on age is the obvious.

The other is for the data hoarders among us. It'll only delete
message types which can be considered low value... Types with
a time aspect like away / back... joins / parts etc.

It tries to do that in a sensible way, so that we don't block
all other db writers that are ongoing.
The "periodically" interval is by design not exposed to the user.

Fixes: #2822

@brunnre8
Copy link
Member Author

brunnre8 commented Nov 4, 2023

This does not yet contain a way for a client to make the storage policy more stringent (meaning, delete more aggressively).
That is a feature we could easily add though, given the structure of it.

For now, that should be the bare minimum so people can finally turn their cronjobs off that manually mess with the db

@brunnre8
Copy link
Member Author

brunnre8 commented Nov 4, 2023

query example, for statusOnly:

delete from messages where id in (
    select id from messages where
    time <= 1698515398915
    and type in ('away','back','invite','join','kick','mode','mode_channel','mode_user','nick','part','quit','ctcp','ctcp_request','chghost','topic','topic_set_by')
    order by time asc
    limit 200
)

Now, this seems to drive up the cpu to 100% for quite a while on every cleaning cycle.

sqlite uses the time index, but the string comparison of the types is... bad.

We can create a new index for messages, on type.
That seems to get rid of the CPU spikes, using more storage and ignoring the time index.

Alternatively we can also stop sorting, we don't really need it. It's just a bit easier for the user to understand, when messages are deleted in order, so that they don't have funny holes in conversations when the max age is very low (say a day)

@brunnre8
Copy link
Member Author

brunnre8 commented Nov 4, 2023

This is now running on my live instance, with the added index:

CREATE INDEX msg_type_idx on messages (type);

@brunnre8 brunnre8 added Type: Feature Tickets that describe a desired feature or PRs that add them to the project. Status: needs-review PR not yet reviewed by enough maintainers labels Nov 4, 2023
@brunnre8 brunnre8 force-pushed the sqliteCleanup branch 5 times, most recently from d5a61a1 to 8ae4715 Compare December 23, 2023 10:03
server/command-line/storage.ts Show resolved Hide resolved
server/command-line/storage.ts Show resolved Hide resolved
server/storageCleaner.ts Show resolved Hide resolved
server/storageCleaner.ts Outdated Show resolved Hide resolved
This makes the usage of the function a bit nicer
This allows us to inject a memory db during testing
This is laying the foundation to build a cleaning task that's
sort of database agnostic.
All calls are done by acting on a "DeletionRequest" so interpretation
of the config will go through a single point
Once this is getting hooked up, it'll periodically delete old
messages.

The StoragePolicy can be chosen by the user, currently there's
two versions, delete everything based on age is the obvious.

The other is for the data hoarders among us. It'll only delete
message types which can be considered low value... Types with
a time aspect like away / back... joins / parts etc.

It tries to do that in a sensible way, so that we don't block
all other db writers that are ongoing.
The "periodically" interval is by design not exposed to the user.
Make the cleaner available to users by exposing it as a subcommand
to thelounge storage.

This is recommended to be run whenever the storage policy significantly
changes in a way that makes many messages eligible for deletion.
The cleaner would cope, but it'll be inefficient and can take many hours.
Due to how storage works in sqlite, the space would not actually be
given back to the OS, just marked for future writes.
Hence this also runs a vacuum to compact the DB as much as it can.
@brunnre8 brunnre8 removed the Status: needs-review PR not yet reviewed by enough maintainers label Dec 25, 2023
@brunnre8 brunnre8 merged commit 7f0b721 into master Dec 26, 2023
12 checks passed
@brunnre8 brunnre8 deleted the sqliteCleanup branch December 26, 2023 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Tickets that describe a desired feature or PRs that add them to the project.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logging - limit disk usage of logs/<username>.sqlite files, remove old records
2 participants