Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safer campaignd file commits #818

Merged

Conversation

Projects
None yet
3 participants
@shikadiqueen
Copy link
Member

commented Oct 10, 2016

See first commit message.

shikadiqueen added some commits Oct 10, 2016

campaignd: Add wrapper for atomic commits of crucial files
As the 2016-10-07~09 downtime incident shows, it is paramount to take
further steps in guaranteeing that the server can't corrupt its own data
files (especially the add-ons database) when receiving
inappropriately-timed signals.

This commit adds and deploys an ostream wrapper that requires callers to
explicitly commit the results to disk when finished writing to the
stream, so that only then the real destination file is overwritten with
the working copy (a temporary in the same dir). This way, there should
never be a situation in which the destination is left in an inconsistent
state due to a signal or exception.

The temporary receives a predictable name right now in the interest of
simplicity, since we are more or less in control of the target directory
anyway. We definitely don't want it to be an unlinked file since that
would make it impossible for admins to inspect and compare the temporary
against the original afterwards.

The code makes some assumptions about the nature of the return value of
filesystem::ostream_file() which will never be broken in this stable
branch, which is why one helper function is in campaignd land rather
than in the global filesystem API for now. This should probably be
rectified when forward-porting to master. Maybe.

Nothing of this will work reliably on Windows but we don't care. There's
only one machine in the world where we support running campaignd at this
time and it runs Linux.
@GregoryLundberg

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2016

A general comment based upon comments on IRC which may not apply here ...

You talked about signal handling and it occurs to me that you might want to consider the following definitions. These are generally how we thought of these 'back in the day' and it's up to you:

SIGINT -- An external change has occurred such as a configuration file. If possible, change to the new configuration for all new work; but allow all ongoing work to complete using the old configuration. Otherwise: perform a shutdown the same as SIGTERM, but the daemon should, however, return to normal service after loading the new configuration.

SIGTERM -- Shutdown requested. Do not accept new work, but finish any already started. At some point the daemon should actually shut down, but it may take a long time.

SIGKILL -- Panic shutdown. It is likely the entire system is going down very soon. Do the bare minimum to prevent corruption and shut down as quickly as possible.

@shikadiqueen

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2016

@GregoryLundberg Your description of SIGINT sounds like the conventional handling of SIGHUP by Unix daemons, actually, not SIGINT. Also, per POSIX, SIGKILL and SIGSTOP are unable to be caught or ignored, and Linux complies with this requirement. There's nothing a program can do about SIGKILL other than be designed with the possibility in mind (i.e. what this PR does).

@shikadiqueen

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2016

Also, worth noting that SIGHUP is already handled with the expected semantics (schedule a config reload for the next processing cycle), and SIGTERM and SIGINT are as well (schedule orderly exit for next cycle) as of commit 019e7a7. SIGINT is the signal programs attached to a supporting terminal get when ^C or the equivalent is sent to it, which is actually something that gets used a lot with campaignd when performing maintenance.

@GregoryLundberg

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2016

Well, OK. I guess I'm showing my BSD roots.

@gfgtdf

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2016

I think std::remove and std::rename do not support utf8 filenames on windows.

@GregoryLundberg

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2016

std::remove ???

I just checked and it appears Windows support for Unicode filenames uses utf-16le so I believe gfgtdf is correct.

@shikadiqueen

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2016

@gfgtdf I don’t see much point into explicitly bringing Boost.Filesystem into the mix here since Windows is already an unsupported platform for this target (the master version won’t even build for Windows since PR #671 was merged) — hence I decided to not bother with Boost.Filesystem here after my disappointing experiences with it. Plus, the big WARNING comment.

@gfgtdf

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2016

Plus, the big WARNING comment.

Specially with his comment, it sounds like the atomic issue is the onyl issue this code has on windows.

hence I decided to not bother with Boost.Filesystem

But do already use or filesystem functions, the first part this line filesystem::file_exists(dest_name_) && std::remove(dest_name_.c_str()) uses filesystem::file_exists instead of whatever std provides but in the second part you decided to use std::remove instead of filesytem::delete_file.

@shikadiqueen

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2016

@gfgtdf I was lazy and fs::file_exists() is more expressive than using stat(). And if you take a look at campaign_server.cpp you’ll see that this is not the only use of std::remove() a.k.a. C remove().

I’m not sure why you keep insisting on this. Perhaps you are personally interested in being able to run this undocumented pile of crap on Windows? I really don’t see the point.

@shikadiqueen shikadiqueen changed the title [DO NOT MERGE] Safer campaignd file commits Safer campaignd file commits Sep 14, 2017

@shikadiqueen shikadiqueen merged commit 17197cf into wesnoth:1.12 Sep 14, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.