-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement "event merging" #68
Comments
What is the reason to have new and modified be different states? It seems that state determination should be based on interrogation of the endpoint(s) of the transfer(s). In other words, a file is new if it does not exist at the destination, and modified if it does. If I have multiple endpoints, they might somehow get into different states. |
The state at the endpoint is also taken into account. However, this is purely about the state of file events in the pipeline queue. The pipeline queue contains files that are yet to be processed (and then transported). Nothing has happened to them yet, except for these events being detected and thus these files ending up in the discovery queue. That's what this is about. The state at the endpoint is only taken into account as soon as we arrive at the transporting step. I hope this answers your concerns. |
I understand the purpose of this ticket--perhaps my question could have been clearer. Is there any need to track three file statuses (new, modified, deleted), or would two decision statuses suffice (needs-to-be-transferred, needs-to-be-deleted)? I suppose there is no need to throw away that information when supplied by the notification library, but there might not be a need to keep it either. If so, instead of worrying about merging events, it's just a binary decision:
It doesn't change the decision, but simplifying like this might have made this design consideration more obvious in the first place. |
First of all: these design decisions were made >2 years ago, so I can't immediately give you "the complete explanation". Those two decision statuses are indeed all that's necessary to be able to know whether to either transfer or delete the file from one of its destinations. Also, I think there may be some edge cases deep in arbitrator.py that require this knowledge. They might not strictly need to, but for now, I'm not going to change this. Finally, in the case of I'm afraid I can't give you a better answer than the above at this time. |
Oops, that commit should have closed #69! Reopening. |
…lity to get an item for a key and to remove an item for a key. Necessary for #68.
There's something stupid I've overlooked: the changed DB schema for |
Whenever a change on the file system is detected, File Conveyor checks adds it to the "discover queue", after which it is moved ASAP to the "pipeline queue". In the pipeline queue, there are items of the form
(input_file, event)
. Now, if a file is changed, that may happen in the form of "1) delete, 2) create". Those will be 2 separate events in the queue. I want to add functionality to merge those events.Also, when a file is first created, then deleted, then created, then deleted, and so on, and the daemon has so many files to process that all those events are still queued, then it is actually not smart enough: it will delete the file from the CDN, then create it, then delete it, and so on. I.e. it will replay the exact events, instead of being smart enough to merge the events into the actual resulting event. Merging these events will make it faster and more efficient.
This is the only major oversight I'm aware of.
All merging patterns:
* modify + modify = modify
* modify + delete = delete
This implies that a new data structure has to be developed: the current
PersistentQueue
data structure does not allow for this merging of events.The text was updated successfully, but these errors were encountered: