Improving Sandman Notifications (no deletion until user warned) #31

michael-grace · 2022-05-06T09:05:51Z

Sandman Notifications

This is a summary to the logic behind Sandman's notifications.

Files

For every file:

(note: there is other logic here, i.e. for detecting staged files, corruptions etc., but I'm only summarising what we care about)

Is it in Limbo?

If it is, we check if it has passed the limbo threshold. If it has, we delete it. There are no notifications here. And we're done.

Has it passed the deletion threshold?

If it has:

move the file to Limbo
delete the original
PERSIST: Status: Deleted, Notified: False

If it hasn't:

we look to see which warning thresholds the file has passed

for every passed warning threshold (based on the file age, doesn't touch DB):

persist: Status: Warned, Notified: False, Time: Warning Time

Emails

For every stakeholder in the system:

Get anything with Deleted status, NOT notified to that stakeholder
For each warning threshold:
- Get anything with Warning status of that time NOT notified to that stakeholder

Send the emails.

Mark everything as notified.

Persisting

We persist the group
We persist the file
We persist the state.
- This means adding the state to status table
- If it is a warning status, we add the time linked to that status to warnings table

To mark something as notified, we add the status and the stakeholder to notifications table.

Summary

It adds the required notifications to the DB marked as not notified. Then, it picks up all the non-notified stuff for that stakeholder, emails them, and then marks it notified.

Notes

The status is per file, so a Deletion status will replace a Warning status.
The times are per warning status, so a more recent one will replace an older one.
If the first time sandman sees a file is past it's deletion threshold, it'll delete the file without ANY warning. Just an email to say it got deleted.
Sandman will send a notification based on when it should have sent. i.e., if the "10 Days" notification was scheduled for Monday, and it didn't run until Thursday, it wouldn't say "6 Days", it would say "10 Days". But the deletion is still in 6 days.

Design

From the design doc:

These lists will need to be persisted to disk, rather than kept in memory, such that:

Importantly, the list of files that have been staged for archival will be used in the draining phase (the way in which they're stored must therefore take this into account).

When files have exceeded a warning checkpoint, they will only be included in that e-mail once (e.g., if the sweep is run daily and there are checkpoints at 72 and 24 hours, this will ensure the 72 hour warning won't also be sent during the 48 and 24 hour sweeps).

In the event of failure, actions that were performed before the failure can still be reported on retrospectively.

Note that the intended deletion time will also need to be stored, in case the mtime of a file changes. That is, if this happens, then it's possible that a previous warning will become re-eligible. For example:

A file's age exceeds the 72 hour warning checkpoint and an e-mail is sent.

The user updates the file, which changes its mtime, no longer making it eligible for automatic deletion.

Eventually, the 72 hour warning relative to the new mtime is again exceeded; another e-mail is expected, despite a 72 hour warning being previously sent.

Note that when files are completely unlinked, their inodes are recycled by the operating system. As such, they should not be used as unique keys in any persistence model, unless they are respectfully recycled.

There's been discussions about whether we needed state to be stored - simply, yes, because we need to know if:

a notification has already been sent or not for a time period
any failure between an action happening and the email being sent (i.e. a failure on deleting another file) won't affect the contents of the email, assuming there isn't a failure in persisting the state of a file when the file is originally actioned. In simple, it's writing the email as it goes, so if something happens, the work is already done and saved instead of being lost.

As this is relational data (files -> different statuses -> statuses notified for various stakeholders), this DB is a good approach, even if the groups and group owners didn't need to be stored in it.

Timeline

    W1       W1             W1 W2       W2          W2 DEL          DEL
    V        V              V  V        V            V  V            V
    |__________________________|________________________|________________>
 warning                     warning                  deletion
threshold                   threshold                threshold
    1                           2

This shows a timeline with two warning thresholds for a file and the deletion point. Above shows what will be sent if Sandman first sees that file at that point in the timeline. It'll send the most recent warning notification that is hasn't sent (stored as whether the stakeholder had been notified in the DB). If Sandman first sees a file after its deletion point, it'll just delete it without warning. (Note: it's just adding it to Limbo, it still gets its full time in Limbo).

This is Bad

We need to ensure at least one notification is sent, so when we come to check if the file can be deleted, we should query the database to see if there is a Warning state that has been notified to at least one stakeholder. If not, it will not move the file to limbo, and will add a Warning status with the time being the Sandman running interval (i.e. one day). This means, the user will get a one day warning without us having to modify the times of the file. The next time Sandman runs, it'll see that the file is past its threshold and that a notification has been sent, so it'll happily delete the file.

Code Snippets

bin/sandman/sweep.py:323

This can have an extra check in to see if there is a warning marked as notified: True in the DB.

log.info(
    f"Deleting: {file.path} has passed the soft-deletion threshold")
if self.Yes_I_Really_Mean_It_This_Time:
    # 0. Instantiate the persisted file model before it's
    #    deleted so we don't lose its stat information
    to_persist = file.to_persistence()

    # 1. Move file to Limbo and delete source
    limboed = vault.add(Branch.Limbo, file.path)
    touch(limboed.path)
    assert hardlinks(file.path) > 1
    try:
        file.delete()  # DELETION WARNING
        log.info(f"Soft-deleted {file.path}")

    except PermissionError:
        log.error(
            f"Could not soft-delete {file.path}: Permission denied")
        return

    log.info(f"{file.path} has been soft-deleted")

    # 2. Persist to database
    self._persistence.persist(
        to_persist, State.Deleted(notified=False))

bin/sandman/sweep.py:359

This is how we add warning notifications to be sent.

self._persistence.persist(
    to_persist, State.Warned(notified=False, tminus=tminus))

bin/sandman/sweep.py:131

Although we can filter out these notifications, currently it only does it by the configured notification periods, so we can't trivially add the notification period sandman run interval.

# Warned files that require notification
for tminus in config.deletion.warnings:
    to_warn = _files(State.Warned, tminus=tminus)

This being said, we could (assuming we don't want this interval to be a typical warning), add it as an extra configuration option, and have something like

for tminus in (*config.deletion.warnings, config.sandman_run_interval):
    ...

If the stakeholder doesn't have any notifcations for the run_interval time period, it'll just skip it, exactly the same as if the user doesn't have warnings for one of the normal warning time periods.

bin/sandman/sweep.py:108

This function is where the query is done: note the query is done with notified: False

def _files(
        state: T.Type[core.persistence.base.State], **kwargs) -> FileCollection.User:
    """
    Filtered file factory for the current stakeholder in
    this context management stack with the given state
    """
    state_args = {"notified": False, **kwargs}
    criteria = Filter(state=state(
        **state_args), stakeholder=stakeholder)
    return stack.enter_context(
        self._persistence.files(criteria))

The text was updated successfully, but these errors were encountered:

sb10 · 2022-05-06T10:25:46Z

I'm not clear on its behaviour when there's already something in the db for the file. From your description it sounds like it always just replaces the entry with a new not-notified entry? But then you'd get multiple emails about the same thing?

sb10 · 2022-05-10T10:35:05Z

Read and understand notification code, write it up

sb10 · 2022-05-10T10:37:35Z

On testing, 2 unexpected things happened:

It sent an email about a file location that sandman wasn't just run on.
The email contained the same fofn for all 3 warning time points.

michael-grace · 2022-05-11T12:05:22Z

Corrections to the Above

The status is not "one status per file", it can be "many statuses to one file". Each warnings status will have its own tminus time.
This also means that a deletion status is added as another status to the file.
Deletion statuses are deleted by the schema.sql script at the start of each sandman run (so, a file will be marked as deleted status in the DB, then that status will be cleared on the next run)
The warning statuses (and therefore the files) aren't deleted, though files for archival are
It will send notification for all non-notified warning thresholds. Note: this will include even if the warning threshold configuration is actually changed during the warning period.

This means that the diagram should be:

                               W1       W1          W1
    W1       W1             W1 W2       W2          W2 DEL          DEL
    V        V              V  V        V            V  V            V
    |__________________________|________________________|________________>
 warning                     warning                  deletion
threshold                   threshold                threshold
    1                           2

As described above, sandman sends all notifications in the DB marked unnotified, even if the sandman run doesn't run on a directory referred to by one of those notifications.

michael-grace · 2022-05-18T14:44:38Z

don't move file to limbo if no warning sent - send warning instead

michael-grace · 2022-05-18T14:45:07Z

As an admin, if sandman can't run on a given day, I don't want it to skip warnings, so that users don't get surprised by unwarned deletions

Added: - Feature where files will only be deleted if they've been warned at least once - Warning period of sandman run interval for files that are to be deleted, but haven't been warned to anyone - Tests for connecting to postgres DB - Postgres instance within Travis - Tests for SQL string generation Modified: - Config tests using environment variables Removed: - DummyPersistence in testing (replaced with postgres)

sb10 added the documentation Improvements or additions to documentation label May 6, 2022

michael-grace mentioned this issue May 24, 2022

Files Get At Least One Warning #34

Merged

sb10 closed this as completed in #34 May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Sandman Notifications (no deletion until user warned) #31

Improving Sandman Notifications (no deletion until user warned) #31

michael-grace commented May 6, 2022

sb10 commented May 6, 2022

sb10 commented May 10, 2022

sb10 commented May 10, 2022

michael-grace commented May 11, 2022

michael-grace commented May 18, 2022

michael-grace commented May 18, 2022

Improving Sandman Notifications (no deletion until user warned) #31

Improving Sandman Notifications (no deletion until user warned) #31

Comments

michael-grace commented May 6, 2022

Sandman Notifications

Files

Emails

Persisting

Summary

Notes

Design

Timeline

This is Bad

Code Snippets

sb10 commented May 6, 2022

sb10 commented May 10, 2022

sb10 commented May 10, 2022

michael-grace commented May 11, 2022

Corrections to the Above

michael-grace commented May 18, 2022

michael-grace commented May 18, 2022