Watch open files for changes #538

cmyr · 2018-03-06T23:45:46Z

This is a first stab at fixing #362.

This is a fairly dumb implementation; no md5, just timestamps.

If we have a file open and it changes on disk we automatically load the changes if we have no local changes, otherwise we set a flag indicating there are changes on disk. In this case, currently, we just don't let you save at that path again.

This behaviour obviously isn't great; I'm still thinking a bit about how this should actually behave. This comes down in general to how we report errors to the client; my feeling is that we should probably define a set of known error codes, (of which FileHasChangedOnDisk could be one) and then more nuanced handling of individual errors could be a client concern; for instance in this case the client could prompt the user to overwrite the changed file, or to show a diff, or whatever. (For the former case, do_save would have to get an overwrite flag.)

Other things:

I'd rewritten all the file watching stuff a while ago because my first pass wasn't very flexible. That's included here.
In the Future I think that something at the runloop level should handle watching files, and send notifications up similar to how idle scheduling works.
Should we be taking an md5 of file contents? Maybe?
It's debatable that this should be merged as is, depending on whether the ergonomic improvement in the case where there are no changes in the open file is offset by the ergonomic degradation in the case where there are changes.

This is a ground-up rewrite of watcher.rs, to more efficiently support various multiple watcher use cases.

This uses modification times to check if a file has been modified by another process. If an open file with no unsaved changes is modified on disk, the changed file is reloaded silently; If a file changes on disk and _does_ have unsaved changes, we pop an alert if the user tries to save. This last behaviour is definitely not ideal, but seemed like an okay first pass. closes xi-editor#362

raphlinus

Some comments, generally looks good.

raphlinus · 2018-03-07T17:53:59Z

rust/core-lib/src/tabs.rs

+                    .unwrap().get_file_has_changed() {
+                        let err_msg = format!("File {:?} has changed on disk. \
+                        Please save as something else. \
+                        I'm sorry if this is annoying.", file_path);


raphlinus · 2018-03-07T17:58:46Z

rust/core-lib/src/tabs.rs

@@ -694,6 +752,40 @@ impl Documents {
        }
    }

+    /// Handles a file system event related to a currently open file
+    fn handle_open_file_fs_event(&mut self, event: DebouncedEvent, _peer: &MainPeer) {


Just curious why this function takes a peer if not used. YAGNI?

Fair, I wrote the method signature before the implementation, and just matched the signature of the other fs event handling function.

raphlinus · 2018-03-07T17:59:39Z

rust/core-lib/src/tabs.rs

+                    None => return,
+                };
+
+                let is_write_conflict = ed.get_file_mod_time()


I think this would be a good point to add a TODO that at present the conflict is determined from file modification time only and not contents; "conflict" would usually imply that.

raphlinus · 2018-03-07T18:01:03Z

rust/core-lib/src/tabs.rs

+
+/// Returns the modification timestamp for the file at a given path,
+/// if present.
+pub fn get_file_mod_time<P>(path: P) -> Option<SystemTime>


I'd be curious how resistant all this logic is to races. From my understanding, View.new_view_with_file reads the file, then does set_path which eventually drills down to this call. If the file is changed between these two invocations, I think we can get a situation where it has the old contents but believes the timestamp is fresh.

Getting this right is hard, in such races a lot can go wrong. I'd lean toward not doing anything heroic in this PR, but filing an issue.

You're right, we could definitely race here. We could definitely try flocking while we do the read / get the modification date without adding much complexity? Seems reasonable, but would add a dep (fs2?). I don't know much about how file locking works in general, possible there's some problem with this approach I'm not aware of.

I'm not expert on flock and friends, but my understanding is that they give very weak guarantees (my recollection from years back is that they're especially weak on network filesystems, which these days are rarer but not unheard of). I think figuring out the right thing to do is something of an investigation, as I don't think there are any clear, simple wins.

Just musing a bit, but couldn't we also just grab the mod time before and after the read, retrying if there's a change?

That sounds plausible. Another possibility is registering the watcher before loading, so that a notification would come in.

raphlinus · 2018-03-07T18:09:46Z

rust/core-lib/src/watcher.rs

+    }
+}
+
+fn mode_from_bool(b: bool) -> RecursiveMode {


is_recursive would be a better name than b

raphlinus · 2018-03-07T18:11:25Z

rust/core-lib/src/watcher.rs

+    // Sleep for `duration` in milliseconds if running on OS X
+    pub fn sleep_macos(duration: u64) {
+        if cfg!(target_os = "macos") {
+            thread::sleep(Duration::from_millis(duration));


Feels like it would be better to call sleep rather than duplicating the impl.

raphlinus · 2018-03-07T18:11:56Z

rust/core-lib/src/watcher.rs

+    }
+
+    // Sleep for `duration` in milliseconds if running on OS X
+    pub fn sleep_macos(duration: u64) {


I think sleep_if_macos might be better, the current name suggests it's a mac-specific impl. That said, looking at the CI failures, I wouldn't be surprised if this logic needs to change.

raphlinus · 2018-03-07T18:22:08Z

rust/core-lib/src/watcher.rs

+    }
+
+    // Sleep for `duration` in milliseconds
+    pub fn sleep(duration: u64) {


I would say millis rather than duration as a more precise characterization of units.

cmyr · 2018-03-07T23:29:42Z

Thanks for the review.

I'm going to let this sit for a little bit and keep running it locally; I haven't played around enough to feel confident that it should come into master.

cmyr · 2018-03-10T19:27:31Z

I've been running this the past few days without anything terrible happening, so let's try bringing it in.

vlovich · 2018-03-12T16:00:18Z

Little late to this review so this might be a fairly stupid question, but can you clarify why we need timestamps/md5 at all? Can't we just translate the filesystem watcher event directly to a UI notification? What problem do the extra checks solve?

cmyr · 2018-03-12T16:18:09Z

@vlovich in the hashing case, it would be nice if a write that didn't change document contents was ignored. In the timestamp case, the main advantage is that we don't need to do anything special when we write the file ourself; because we update our internal timestamp after the write, we can correctly ignore any notifications that result from writes that we initiate.

vlovich · 2018-03-12T20:22:39Z

I would think hashing would drastically slow down editing large files as you're going to have to hash on every edit, no? You might be able to make it smarter enough to do at least once per FS change although that still seems like an expensive operation for large files to me. Are there any existing editors out there you know of that hash the contents?

For the self-update case, would it be perhaps simpler to just unregister the watch file saving & re-register after the save is complete? It has a race condition window but it seems like it would be the same as the last modified timestamp approach. It's also more robust than the timestamp approach as changing file contents without a changing timestamp seems like a realistic scenario that could come up in the real world.

cmyr · 2018-03-12T20:53:38Z

@vlovich agree, the hashing approach (which is not currently implemented) is only something we would ever do after detecting a write, in which case we would be reading the file anyway; and we wouldn't be comparing to the active buffer contents, but just to a stored hash of that file.

I went with the timestamp approach because that's what NSDocument does, and it seemed like a reasonable middle ground; but it's definitely possible that there could be races there; I suspect much of that will be file system dependent?

For self-update, registering/unregistering a watcher is not an atomic operation, so it suffers similar problems. It's also very dependent on the underlying API (FSEvent / kqueue / inotify).

I'm definitely not a file systems expert, so I can't speak to the likelihood of sequential writes to a file producing identical timestamps. It seems like that would be... bad? If this is actually a common thing that happens, I agree we should look at another approach. I'm hesitant about relying too heavily on the notifier though; I think the filesystem should be the source of truth, and the notifier is just telling us when we might want to go take a look.

vlovich · 2018-03-12T22:59:19Z

So in the typical case of editing a plain file, you are correct that the timestamps are not going to be surprising. There are actually mount options on Linux which disable modified timestamps too for extra performance so Xi would be broken on those systems - if Xi is at all trying to compete with Vi/Emacs I think that's something to consider, if it's more traditionally Gui stuff then probably not. These kinds of systems are probably not a concern at the moment IMO.

The biggest problem space I see is VCS operation(s) to have the file modified without the timestamp changing or the file stay the same but the timestamp changes.

Similarly, unzipping a file typically restores the last modified time so it's not out of the realm of possibility that you end up with weird stuff about the timestamps that don't correlate with your assumptions about content changes.

I think the model for this particular use case is erring on the side of assume things changed if the FS event tells you it probably did & let the user decide. If you want to be more clever then sure, but I think you want to aim for 0 chance of false positive/false negative (e.g. hashing) rather than trying to avoid notifying the user.

Regarding the race condition, yes I agree 100% & the same race condition exists with when you end up reading the timestamp. I think a potential mitigation would be to pick a random file name that's a sibling & rename on top of the original. Then you can tell it's a save because you just ignore the updates for a rename from the random path that no one but you could know. I would think this way the race condition is pretty effectively eliminated.

cmyr added 4 commits December 6, 2017 15:50

Rewrite FileWatcher

0bb39e3

This is a ground-up rewrite of watcher.rs, to more efficiently support various multiple watcher use cases.

Simplfy watcher implementation

015f511

Merge branch 'master' into better-watcher

d289a52

googlebot added the cla: yes label Mar 6, 2018

raphlinus reviewed Mar 7, 2018

View reviewed changes

cmyr added 2 commits March 7, 2018 14:00

Review feedback

21587e6

Address failing watcher tests

4afbfd8

raphlinus approved these changes Mar 7, 2018

View reviewed changes

cmyr mentioned this pull request Mar 8, 2018

[RFC] standardize reporting of async errors #543

Open

raphlinus mentioned this pull request Mar 8, 2018

November 2017 Roadmap #437

Closed

10 tasks

cmyr merged commit 21ae842 into xi-editor:master Mar 10, 2018

cmyr deleted the better-watcher branch March 10, 2018 19:28

cmyr mentioned this pull request Mar 10, 2018

File change monitoring is susceptible to races #547

Open

Cogitri mentioned this pull request Mar 3, 2019

Tell the frontend when a file has been changed since opening it #1126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watch open files for changes #538

Watch open files for changes #538

cmyr commented Mar 6, 2018 •

edited

Loading

raphlinus left a comment

raphlinus Mar 7, 2018

raphlinus Mar 7, 2018

cmyr Mar 7, 2018

raphlinus Mar 7, 2018

raphlinus Mar 7, 2018

cmyr Mar 7, 2018

raphlinus Mar 7, 2018

cmyr Mar 7, 2018

raphlinus Mar 7, 2018

raphlinus Mar 7, 2018

raphlinus Mar 7, 2018

raphlinus Mar 7, 2018

raphlinus Mar 7, 2018

cmyr commented Mar 7, 2018

cmyr commented Mar 10, 2018

vlovich commented Mar 12, 2018

cmyr commented Mar 12, 2018

vlovich commented Mar 12, 2018 •

edited

Loading

cmyr commented Mar 12, 2018

vlovich commented Mar 12, 2018 •

edited

Loading

Watch open files for changes #538

Watch open files for changes #538

Conversation

cmyr commented Mar 6, 2018 • edited Loading

raphlinus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmyr commented Mar 7, 2018

cmyr commented Mar 10, 2018

vlovich commented Mar 12, 2018

cmyr commented Mar 12, 2018

vlovich commented Mar 12, 2018 • edited Loading

cmyr commented Mar 12, 2018

vlovich commented Mar 12, 2018 • edited Loading

cmyr commented Mar 6, 2018 •

edited

Loading

vlovich commented Mar 12, 2018 •

edited

Loading

vlovich commented Mar 12, 2018 •

edited

Loading