When no `fileDeletionDelayAfterSync` attribute is set, no deletion should happen at all #108

Open
kneedrag opened this Issue Mar 9, 2012 · 13 comments

Projects

None yet

6 participants

@kneedrag

Mosso and symlink_or_copy transporters. FileConveyor deletes all the source files. A workaround is to set the fileDeletionDelayAfterSync to an extremely high number that hopefully would never be hit. Would be better to find out why this is occurring and fix it.

This workaround seems to be working for me:
rule for="items" label="cdn" fileDeletionDelayAfterSync="60000000000000000"

@satheshf12000

@kneedrag Did you find a work around yet? If so, please post it here for the community. Thanks.

@kneedrag

I found adding this to my "rule worked fine...

fileDeletionDelayAfterSync="60000000000000000"

Then periodically I use sqlite from the commmand line, and open the persistent_data.db. From there you can issue the command:

delete from files_to_delete_list where id > 0 ;

Not the best solution, there are likely others -- but this has served me fine....

Good luck.....

@mnestor

I don't even have a delete section on mine and it nuked everything. Ugh!

@mnestor

Probably not the best python code

Change [fileconveyor/arbitrator.py:778]
if rule["deletionDelay"] > 0:

To
if rule["deletionDelay"] is None:
self.logger.warning("Not going to delete '%s'" % (input_file))
elif rule["deletionDelay"] > 0:

@satheshf12000

@kneedrag Thanks & yeah for now that seems to be the only solution..

@mnestor Thanks for the code .. I will try it..

@chrisivens

I've been looking into this and it looks like all rules and their deletion settings are applied to every file. This didn't make sense to me so I borrowed the code from the filter section to see if the file matched the rule before applying the delete timeout.

I've made a patch to show you guys what I mean.

diff --git a/fileconveyor/arbitrator.py b/fileconveyor/arbitrator.py
index 2eec1f3..f755cf5 100644
--- a/fileconveyor/arbitrator.py
+++ b/fileconveyor/arbitrator.py
@@ -28,7 +28,7 @@ from persistent_list import *
 from fsmonitor import *
 from filter import *
 from processors.processor import *
-from transporters.transporter import Transporter
+from transporters.transporter import Transporter, ConnectionError
 from daemon_thread_runner import *


@@ -774,15 +774,21 @@ class Arbitrator(threading.Thread):
                 # does not make sense, of course).
                 for rule in self.rules:
                     if event != FSMonitor.DELETED:
-                        if rule["deletionDelay"] > 0:
-                            self.lock.acquire()
-                            self.files_to_delete.append((input_file, time.time() + rul
-                            self.logger.warning("Scheduled '%s' for deletion in %d sec
-                            self.lock.release()
-                        else:
-                            if os.path.exists(input_file):
-                                os.remove(input_file)
-                            self.logger.warning("Deleted '%s' as per the '%s' rule." %
+                        if input_file.startswith(self.config.sources[rule["source"]]["
+                            (rule["filter"] is None
+                            or
+                            rule["filter"].matches(input_file, file_is_deleted=False))
+                            
+                            if rule["deletionDelay"] > 0:
+                                self.lock.acquire()
+                                self.files_to_delete.append((input_file, time.time() +
+                                self.logger.warning("Scheduled '%s' for deletion in %d
+                                self.lock.release()
+                            else:
+                                if os.path.exists(input_file):
+                                    os.remove(input_file)
+                                self.logger.warning("Deleted '%s' as per the '%s' rule
+

                 # The file went all the way through the pipeline, so now it's safe
                 # to remove it from the persistent 'files_in_pipeline' list.

And here's the commit in my fork

I expect at some point we need to make sure the delete rule is set to start with as per mnestor's suggestion

@mrfelton

I have this fixed in my fork as per mnestor 's suggestion. systemseed/fileconveyor@68c40eb

@chrisivens

I had to deliberately break file deletion on my fork on purpose. I'm running glusterfs and I'm not sure whether it is causing some issues with pynotify or something else in the chain.

Moving to a new box in a week or so, so I'll be checking out whether mrfelton's fork works well. I might make a hybrid for a bit with broken deletion but good logging to make sure it is behaving.

It'll be a production box and I can't risk deletions so I'm erring on the side of caution.

Incidentally, is there a good migration strategy for fileconveyor? If the paths are the same I can't see an issue but I guess with the pickling it makes it harder to move things.

@wimleers wimleers added a commit that closed this issue Oct 14, 2012
@mrfelton mrfelton Ensure that source files are only deleted if deletionDelay is set. Fi…
…xes a bug introduced in #100. Closes #108.
225d1bd
@wimleers wimleers closed this in 225d1bd Oct 14, 2012
@wimleers
Owner

This is obviously a super critical bug, that was introduced in #100. My apologies. This should never have happened, of course.

I merged @mrfelton's systemseed/fileconveyor@68c40eb, with an updated commit message and an added trailing period. Thanks, @mrfelton!

@chrisivens

I still have a feeling this needs to check the individual rules for deletion delay. I'll perhaps re-read the code to see if I'm right but if memory serves me correctly. A deletion delay setting on any rule seemed to set the value on a file regardless of whether it matched.

@wimleers
Owner

Can you expand on what you mean, @chrisivens? It's not entirely clear to me (but perhaps that's because I haven't got the code in my head anymore).

@wimleers wimleers reopened this Nov 1, 2012
@chrisivens

Likewise I don't think I can be as clear as I should be but I'll try and paint a scenario.

Lets say you have 2 rules set up, one has deletion delay set and one has nothing. The system notices a change on a file and checks on the actions that should be performed. It then checks to see if deletiondelay has been set. From memory, it doesn't check a specific rule match against the file, it'll check both those rules and if it find a deletiondelay setting it'll use it.

It might be for rules with paths that overlap only but I can't be certain. Either way, I added a check (borrowed from elsewhere in the code) to see whether the file is affected by a rule or not before setting the delete time if needed.

I could have things totally wrong but I did add that code (above) as a reaction to delete times being added where they shouldn't.

EDIT:
looking at the code very quickly, it says for each rule, check the deletion delay setting and apply if found. No check to see if it matches against the file. All rules will get checked and presumably, the last rule with a deletiondelay value will be the one that sticks.

@wimleers
Owner

You're right that multiple rules might apply to a single file and thus might cause conflicting behavior (in that one assumes the file continues to exist, the other assumes it gets deleted). The interesting thing then is that the rule without the deletionDelay will sync the deletion to its targets…

Without deletionDelay, multiple rules applying to a single file wasn't an issue, since the source was essentially always untouched. Now it is.

How do you propose we solve this? AFAICT the only clean solution here is to very clearly document that once you start using deletionDelay on a rule, you must ensure that no other rule will govern the same file.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment