-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch open files for changes #538
Conversation
This is a ground-up rewrite of watcher.rs, to more efficiently support various multiple watcher use cases.
This uses modification times to check if a file has been modified by another process. If an open file with no unsaved changes is modified on disk, the changed file is reloaded silently; If a file changes on disk and _does_ have unsaved changes, we pop an alert if the user tries to save. This last behaviour is definitely not ideal, but seemed like an okay first pass. closes xi-editor#362
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, generally looks good.
.unwrap().get_file_has_changed() { | ||
let err_msg = format!("File {:?} has changed on disk. \ | ||
Please save as something else. \ | ||
I'm sorry if this is annoying.", file_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
rust/core-lib/src/tabs.rs
Outdated
@@ -694,6 +752,40 @@ impl Documents { | |||
} | |||
} | |||
|
|||
/// Handles a file system event related to a currently open file | |||
fn handle_open_file_fs_event(&mut self, event: DebouncedEvent, _peer: &MainPeer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why this function takes a peer if not used. YAGNI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I wrote the method signature before the implementation, and just matched the signature of the other fs event handling function.
rust/core-lib/src/tabs.rs
Outdated
None => return, | ||
}; | ||
|
||
let is_write_conflict = ed.get_file_mod_time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a good point to add a TODO that at present the conflict is determined from file modification time only and not contents; "conflict" would usually imply that.
|
||
/// Returns the modification timestamp for the file at a given path, | ||
/// if present. | ||
pub fn get_file_mod_time<P>(path: P) -> Option<SystemTime> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious how resistant all this logic is to races. From my understanding, View.new_view_with_file reads the file, then does set_path which eventually drills down to this call. If the file is changed between these two invocations, I think we can get a situation where it has the old contents but believes the timestamp is fresh.
Getting this right is hard, in such races a lot can go wrong. I'd lean toward not doing anything heroic in this PR, but filing an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we could definitely race here. We could definitely try flock
ing while we do the read / get the modification date without adding much complexity? Seems reasonable, but would add a dep (fs2?). I don't know much about how file locking works in general, possible there's some problem with this approach I'm not aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not expert on flock
and friends, but my understanding is that they give very weak guarantees (my recollection from years back is that they're especially weak on network filesystems, which these days are rarer but not unheard of). I think figuring out the right thing to do is something of an investigation, as I don't think there are any clear, simple wins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just musing a bit, but couldn't we also just grab the mod time before and after the read, retrying if there's a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds plausible. Another possibility is registering the watcher before loading, so that a notification would come in.
rust/core-lib/src/watcher.rs
Outdated
} | ||
} | ||
|
||
fn mode_from_bool(b: bool) -> RecursiveMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_recursive
would be a better name than b
rust/core-lib/src/watcher.rs
Outdated
// Sleep for `duration` in milliseconds if running on OS X | ||
pub fn sleep_macos(duration: u64) { | ||
if cfg!(target_os = "macos") { | ||
thread::sleep(Duration::from_millis(duration)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like it would be better to call sleep rather than duplicating the impl.
rust/core-lib/src/watcher.rs
Outdated
} | ||
|
||
// Sleep for `duration` in milliseconds if running on OS X | ||
pub fn sleep_macos(duration: u64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sleep_if_macos
might be better, the current name suggests it's a mac-specific impl. That said, looking at the CI failures, I wouldn't be surprised if this logic needs to change.
rust/core-lib/src/watcher.rs
Outdated
} | ||
|
||
// Sleep for `duration` in milliseconds | ||
pub fn sleep(duration: u64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say millis
rather than duration
as a more precise characterization of units.
Thanks for the review. I'm going to let this sit for a little bit and keep running it locally; I haven't played around enough to feel confident that it should come into master. |
I've been running this the past few days without anything terrible happening, so let's try bringing it in. |
Little late to this review so this might be a fairly stupid question, but can you clarify why we need timestamps/md5 at all? Can't we just translate the filesystem watcher event directly to a UI notification? What problem do the extra checks solve? |
@vlovich in the hashing case, it would be nice if a write that didn't change document contents was ignored. In the timestamp case, the main advantage is that we don't need to do anything special when we write the file ourself; because we update our internal timestamp after the write, we can correctly ignore any notifications that result from writes that we initiate. |
I would think hashing would drastically slow down editing large files as you're going to have to hash on every edit, no? You might be able to make it smarter enough to do at least once per FS change although that still seems like an expensive operation for large files to me. Are there any existing editors out there you know of that hash the contents? For the self-update case, would it be perhaps simpler to just unregister the watch file saving & re-register after the save is complete? It has a race condition window but it seems like it would be the same as the last modified timestamp approach. It's also more robust than the timestamp approach as changing file contents without a changing timestamp seems like a realistic scenario that could come up in the real world. |
@vlovich agree, the hashing approach (which is not currently implemented) is only something we would ever do after detecting a write, in which case we would be reading the file anyway; and we wouldn't be comparing to the active buffer contents, but just to a stored hash of that file. I went with the timestamp approach because that's what NSDocument does, and it seemed like a reasonable middle ground; but it's definitely possible that there could be races there; I suspect much of that will be file system dependent? For self-update, registering/unregistering a watcher is not an atomic operation, so it suffers similar problems. It's also very dependent on the underlying API (FSEvent / kqueue / inotify). I'm definitely not a file systems expert, so I can't speak to the likelihood of sequential writes to a file producing identical timestamps. It seems like that would be... bad? If this is actually a common thing that happens, I agree we should look at another approach. I'm hesitant about relying too heavily on the notifier though; I think the filesystem should be the source of truth, and the notifier is just telling us when we might want to go take a look. |
So in the typical case of editing a plain file, you are correct that the timestamps are not going to be surprising. There are actually mount options on Linux which disable modified timestamps too for extra performance so Xi would be broken on those systems - if Xi is at all trying to compete with Vi/Emacs I think that's something to consider, if it's more traditionally Gui stuff then probably not. These kinds of systems are probably not a concern at the moment IMO. The biggest problem space I see is VCS operation(s) to have the file modified without the timestamp changing or the file stay the same but the timestamp changes. Similarly, unzipping a file typically restores the last modified time so it's not out of the realm of possibility that you end up with weird stuff about the timestamps that don't correlate with your assumptions about content changes. I think the model for this particular use case is erring on the side of assume things changed if the FS event tells you it probably did & let the user decide. If you want to be more clever then sure, but I think you want to aim for 0 chance of false positive/false negative (e.g. hashing) rather than trying to avoid notifying the user. Regarding the race condition, yes I agree 100% & the same race condition exists with when you end up reading the timestamp. I think a potential mitigation would be to pick a random file name that's a sibling & rename on top of the original. Then you can tell it's a save because you just ignore the updates for a rename from the random path that no one but you could know. I would think this way the race condition is pretty effectively eliminated. |
This is a first stab at fixing #362.
This is a fairly dumb implementation; no md5, just timestamps.
If we have a file open and it changes on disk we automatically load the changes if we have no local changes, otherwise we set a flag indicating there are changes on disk. In this case, currently, we just don't let you save at that path again.
This behaviour obviously isn't great; I'm still thinking a bit about how this should actually behave. This comes down in general to how we report errors to the client; my feeling is that we should probably define a set of known error codes, (of which
FileHasChangedOnDisk
could be one) and then more nuanced handling of individual errors could be a client concern; for instance in this case the client could prompt the user to overwrite the changed file, or to show a diff, or whatever. (For the former case,do_save
would have to get anoverwrite
flag.)Other things: