Message history or retention period #106

Open
phillip-white-sociomantic opened this Issue Oct 1, 2015 · 9 comments

Projects

None yet

3 participants

@phillip-white-sociomantic

Is it currently possible to set the history period the messages are kept? I imagine this needs to be somewhere otherwise eventually the server will get slow with thousands of messages. Most chat services have a purging feature which can be configured.

@timabbott
Member

There is currently no way to automatically purge old messages -- many users value the ability to see their message history from years ago, and so that's the default.

There are no material performance issues with storing 50M messages in a single database server's history (though eventually you start to need more disk).

That said, if this feature were desired to save disk (or just to be able to have one's data eventually go away!), it shouldn't be too hard to write (the main challenge being handling caching properly), though we'd want to be very careful about testing since buggy deletion code is a great way to delete all the messages :).

@phillip-white-sociomantic

ok, understood

@timabbott timabbott reopened this Dec 4, 2015
@timabbott
Member

@blablacio is working on this feature!

@timabbott timabbott added this to the 2016 roadmap milestone Apr 29, 2016
@timabbott
Member

Here's some notes on how we should implement this feature:

  • We'll add a retention policy feature for the realm specifying how long messages are kept before being automatically deleted. Implementation-wise, I'd like to use a model where after the N days the messages are deleted from Message, but added to a new table e.g. ArchivedMessage, and kept there for a week before being deleted finally, so that we have time to catch bugs for data is lost permanently.
  • I think it will work to create models like 'class ArchivedMessage(Message)' that inherits from Message and a similar table inheriting from UserMessage, and just add to the new table whatever field(s) we need to keep track of when they'll be deleted. We might need to some set Django meta fields on the classes to ensure that ArchivedMessage rows are only present in the new tables. That approach should minimize the maintenance involved when we modify the Message tables.
  • Since bugs in this system would result in data loss, we should make sure to have a really good test suite for this feature.
  • The command-line tool should be a Django management command that calls library functions inside a new file zerver/lib/retention.py, similar to how zerver/management/commands/fill_memcached_caches.py works.
  • Eventually we'll add a cron job in puppet/zulip/files/cron to run the management command.
  • Once we do the Message stuff correctly, we'll need to do corresponding changes in Attachment and MessageReaction to handle this properly.

I'd like to merge incremental pieces of this implementation as we go rather than building the whole thing before we merge anything. e.g., first add tooling to just determine which messages should be archived with good tests, then add a function to move a set of messages from Message to ArchivedMessage rows with tests, then a function to do the same with ArchivedUserMessage, etc. Ideally we'd be merging code for this feature every day or two, with good tests, rather than building up a big monolithic implementation.

@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Oct 25, 2016
@kkanahin kkanahin retention-policy: Add tool to determine expired messages.
- Add Realm model message_retention_days field to setup
  messages expired period for realm.
- Add migration.
- Add tool to get expired messages for each Realm.
- Add tests to cover tool for getting expired messages.

Fixes #106
7460e14
@kkanahin
Contributor

@timabbott I've done first stage(added tool for getting expired messages). Please review PR #2119. I need your feedback to continue work on it.

@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Oct 25, 2016
@kkanahin kkanahin retention-policy: Add tool to determine expired messages.
- Add Realm model message_retention_days field to setup
  messages expired period for realm.
- Add migration.
- Add tool to get expired messages for each Realm.
- Add tests to cover tool for getting expired messages.

Fixes #106
85accea
@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Oct 28, 2016
@kkanahin kkanahin retention-policy: Add tool to determine expired messages.
- Add Realm model message_retention_days field to setup
  messages expired period for realm.
- Add migration.
- Add tool to get expired messages for each Realm.
- Add tests to cover tool for getting expired messages.

Fixes #106
2d26d3d
@kkanahin
Contributor
kkanahin commented Nov 1, 2016

It would be better to use hashes as message identifier do not depend on db ids. It useful when messages are archiving.

@timabbott
Member

Can you explain more what you mean? Hashes of what?

@kkanahin
Contributor
kkanahin commented Nov 2, 2016

sorry I mean uid, https://docs.djangoproject.com/en/1.8/ref/models/fields/#uuidfield which is not connected with ids. Ids values can be broken by vacuum or rewriting free ids values but we need to have independent identidier for each message to restore data or put them to archive

@timabbott
Member

I understand the issue, but I'm not sure that using UUIDs is the best solution.

Having thought about this more, I think we probably need to have some mechanism to present IDs from archived objects from being reused. Because Zulip orders messages by ID in all the UI/display logic, and we don't intend to change that, if a message were to have its ID change through an archive+unarchive process, it would no longer be possible to restore the message the right place in the historical message sort order. So, we can't allow reuse of IDs for things that have been moved to ArchivedMessage but not yet deleted.

At the moment, nothing is required to do that, since we don't have any features other than the rarely used import tool that would cause Zulip to add new messages with an ID that doesn't construct message IDs via autoincrementing; and we could always just have things that might reuse free message IDs like the import tool (./manage.py export --help for details about that system) check IDs it wants to use against e.g. the ArchivedMessage table before trying to use them. I doubt we'll ever have features that reuse old message IDs other than the old import tool.

Assuming I'm missing something here, though, why not just use an autoincrement style ID column on the ArchivedFoo tables, that happens to be a different field from e.g. the message IDs, rather than a uuidfield?

@timabbott timabbott modified the milestone: Zulip roadmap, Old roadmap Nov 18, 2016
@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Feb 15, 2017
@kkanahin kkanahin retention-period: Add archiving expired messages by retention period.
- Add models to store expired messages and expired user messages data after
  removing from original Message and UserMessage db tables.
- Add django migrations to create tables for archive models.
- Add tools for moving expired messages to archive tables.
- Add tools for deleting expired messages from Message and UserMessage tables.
- Add tests to check archiving and deleting tools.
- Add manage command to archive expired messages.

Fixes #106
b974daf
@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Feb 15, 2017
@kkanahin kkanahin retention-period: Add archiving attachments feature.
- Add model with migration for attachments archiving.
- Add archiving attachments to archiving tool.
- Add tests.

Fixes #106
919d7c2
@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Feb 15, 2017
@kkanahin kkanahin retention-period: Add removing archived data feature.
- Add retention period for archived data to django settings.
- Add method for removing archived data.
- Add manage command for removing old archived data
- Add test to check removing archived data.

Fixes #106
6ebf89e
@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Feb 15, 2017
@kkanahin kkanahin retention-period: Add cron jobs for archiving tools.
- Add cron job for moving expired messages with attachments
  to archive tables.
- Add cron job for removing old archived data from db and
  remove attachments files.

Fixes #106
2f15c42
@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Feb 15, 2017
@kkanahin kkanahin retention-period: Add retention period to front-end admin organizatio…
…n settings.

- Add message retention period field to organization settings form.
- Add css for retention period field.
- Add convertor to not negative int or to None.
- Add retention period setting processing to back-end.
- Fix tests.

Fixes #106
99406ad
@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Feb 15, 2017
@kkanahin kkanahin retention-period-doc: Add documentation page for retention feature
Fixes #106
193ecf0
@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Feb 15, 2017
@kkanahin kkanahin retention-period: Add restore tools for archived data.
- Add tool to restore archived messages.
- Add tool to restore archived user_messages.
- Add tool to restore archived attachments and
  archived manytomany related records.
- Add management console command to restore archived
  data by realm.
- Add documentation for restoring tools
- Add tests for restoring tools.

Fixes #106
705cd0f
@kkanahin kkanahin added a commit to SmartPeople/zulip that referenced this issue Feb 15, 2017
@kkanahin kkanahin retention-period: Fix wording and typos.
- Add additional comments.
- Rephrase comments to make them more clear.
- Fix typos.

Fixes #106
7dcea2a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment