New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to disable fsync #17

Open
untitaker opened this Issue Jul 26, 2016 · 8 comments

Comments

Projects
None yet
2 participants
@untitaker
Copy link
Owner

untitaker commented Jul 26, 2016

Unsure if anybody needs this. I might need this because in one usecase I'm writing a lot of files (to different filenames), and only need a guarantee that a SIGKILL won't leave a partially written file (at the target location, tmpfiles are irrelevant).

Also this might be a problem with SSDs, as mentioned in #6

@untitaker

This comment has been minimized.

Copy link
Owner

untitaker commented Jul 26, 2016

Probably should be done through subclassing, but this is not cleanly possible at the moment.

@untitaker

This comment has been minimized.

@mozillazg

This comment has been minimized.

Copy link

mozillazg commented Jun 26, 2018

I have the same usecase recently. After disable fsync, the program that writing 900+ files has 8x speedup(from 8s to 1s) on Linux.

@untitaker

This comment has been minimized.

Copy link
Owner

untitaker commented Jun 26, 2018

@mozillazg removing the fsync for the file breaks atomicity, which is the point of this library. I think you removed this one, right?

We can add an option to remove the fsync for the directory though.

@mozillazg

This comment has been minimized.

Copy link

mozillazg commented Jun 26, 2018

@untitaker Yes, you are right, I removed that(both for file and directory).

Does fsync for file is must operation? Not sure whether below document is useful, does rename work as read after write?:

After a write() to a regular file has successfully returned:

Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

Any subsequent successful write() to the same byte position in the file shall overwrite that file data.

http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

I'll test just remove the fsync for the directory.

@untitaker

This comment has been minimized.

Copy link
Owner

untitaker commented Jun 26, 2018

@mozillazg read after write without fsync will work fine.

The fsync is for when your computer looses power or gets a bluescreen: it minimizes the chance of a partially written file (either old or new version will be there)

@mozillazg

This comment has been minimized.

Copy link

mozillazg commented Jun 27, 2018

@untitaker Thanks for your reply.

On ext4, when auto_da_alloc is enabled(default on) it looks like removing the fsync for file is safe too:

auto_da_alloc(*)	Many broken applications don't use fsync() when 
noauto_da_alloc		replacing existing files via patterns such as
			fd = open("foo.new")/write(fd,..)/close(fd)/
			rename("foo.new", "foo"), or worse yet,
			fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
			If auto_da_alloc is enabled, ext4 will detect
			the replace-via-rename and replace-via-truncate
			patterns and force that any delayed allocation
			blocks are allocated such that at the next
			journal commit, in the default data=ordered
			mode, the data blocks of the new file are forced
			to disk before the rename() operation is
			committed.  This provides roughly the same level
			of guarantees as ext3, and avoids the
			"zero-length" problem that can happen when a
			system crashes before the delayed allocation
			blocks are forced to disk.

https://www.kernel.org/doc/Documentation/filesystems/ext4.txt

@untitaker

This comment has been minimized.

Copy link
Owner

untitaker commented Jun 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment