Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up startstop_node and add 'tahoe daemonize' #417

Merged
merged 2 commits into from
Sep 19, 2017

Conversation

meejah
Copy link
Contributor

@meejah meejah commented May 17, 2017

This sets the stage for further changes to the startup process so that "async things" are done before we create the Client instance while still reporting early failures to the shell where "tahoe start" is running

"tahoe daemonize" does what "tahoe start" did before. The new "tahoe start" shells out to "tahoe daemonize" under the hood and then monitors the logs for expected start-up messages -- when it informs the user and exits.

@meejah
Copy link
Contributor Author

meejah commented May 17, 2017

Hmm, it seems at least the unit-tests that start introducer etc don't work with the new code (i.e. they're not seeing the startup message in the logs)? Can someone with MacOS test this branch?

Copy link
Member

@exarkun exarkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Some comments inline. Mostly, but not entirely, documentation related or non-actionable. I also tried out the code and managed to start and stop a tahoe node with start, stop, run, and daemonize. Only a couple minor hiccups (also explained inline).

@@ -41,7 +53,13 @@ class Options(usage.Options):
+ stats_gatherer.subCommands
+ admin.subCommands
+ GROUP("Controlling a node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. Weird. :/ Maybe there should be an upstream feature request for subcommand groups?

@@ -29,6 +30,17 @@ def GROUP(s):
if _default_nodedir:
NODEDIR_HELP += " [default for most commands: " + quote_local_unicode_path(_default_nodedir) + "]"


# XXX all this 'dispatch' stuff needs to be unified + fixed up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True statement.

def identify_node_type(basedir):
for fn in listdir_unicode(basedir):
if fn.endswith(u".tac"):
tac = str(fn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is old code just being moved so I guess it shouldn't change ... but if I were going to change it, I would drop the str(fn) (which is probably always the wrong thing to do to a unicode string) so that tac remains unicode. Then I'd made the tuple being iterated over below contain unicode strings as well.

The application code that I can see that uses identify_node_type (at least, that's included in the diff github is showing me right now) expects the return value to be text-ish (for example, it formats it into a string for a person to read). So the old return type of bytes is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree this should almost certainly be unicode. Most of its use is really "in code" in basically switch statements. I didn't want to muck about with existing code "too much" but could be easily convinced to do more "mucking" :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to unicode.

from allmydata.stats import StatsGathererService
srv = StatsGathererService(verbose=True)
else:
raise ValueError("unknown nodetype %s" % self.nodetype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old code just being re-indented... So probably doesn't make sense to change it much. But:

  • codecov reports it as uncovered - not sure if that's true or not (multiprocess shenanigans?)
  • clearly should be doing some more data-y dispatch. Most naively:
d = {
    "client": lambda: namedAny("allmydata.client.Client")(self.basedir),
    "introducer": lambda: namedAny("allmydata.introducer.server.IntroducerNode")(self.basedir),
    ...
}
if self.nodetype in d:
    srv = d[self.nodetype]()
else:
    raise ValueError(...)

srv = StatsGathererService(verbose=True)
else:
raise ValueError("unknown nodetype %s" % self.nodetype)
print("SRV {}".format(srv))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray debug print?


def start(config):
class DaemonizeTahoeNodePlugin:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this brand new class be new-style instead of classic? And perhaps have a class docstring.

@@ -166,102 +174,9 @@ def start(config):
else:
verb = "starting"

runner = twistd._SomeApplicationRunner(twistd_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sad about most of the bits being private in Twisted. I suspect this in particular may cause some trouble sooner rather than later (to the extent anything in Twisted happens "sooner") due to the twistd / twist situation and the apparent desire to move things over to twist. OTOH, perhaps that actually means all of the twistd implementation will remain the same until it is deprecated and removed, which wouldn't be so bad for this...

codecov also says this code is uncovered, though. If that's really the case, adding some test coverage here seems particularly important.

On the gripping hand, all runApp did was make a runner and then call run on it... Which is all this code appears to be doing now, albeit with an intervening print. Can the print just happen before a runApp call and avoid the need to touch this private API?

+ [
["daemonize", None, tahoe_daemonize.DaemonizeOptions, "run a node disconnected from terminal"],
["start", None, tahoe_start.StartOptions, "start a node"],
["run", None, tahoe_run.RunOptions, "run a node"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not terribly clear what the difference between "start" and "run" is here. I think the old text included the word "synchronously" which I don't think added a whole lot. I think "run" is "run in the foreground", "start" is "old-style run daemonized, maybe deprecated soon", and "daemonize" is "new-style run daemonized"?

print("Logs are available in '{}'".format(log_fname))
print("Collected for this run:")
print(collected)
sys.exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some additional logic may be needed here. For example, tahoe start /path/... --help now displays twistd's help and then this warning about something having gone wrong.

Maybe --help is the only case not handled by the CalledProcessError case above? It seems like most mis-uses of twistd options cause it to exit with a non-zero code, following the exception path above.

I could also imagine the --logfile and/or --logger options causing problems here. Not sure how much effort it is worth going to to make sure every obscure case works exactly as it did before... Maybe better to quickly deprecate this particular invocation and move on. That's the plan, right? Get everyone to use daemonize instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that looks like an edge case just when you provide both a node-directory and --help (i.e. just tahoe start --help does the right thing it seems).

Yes, --logfile does cause problems -- because then we're watching the wrong logs. Hmm :/

The idea is that people who really want daemonization should use tahoe daemonize and that most users should use tahoe run along with their favourite "thing that runs daemons". So yes: deprecated tahoe start...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two ways to deal with the --logger option: if any logger options are provided, we say we can't watch the logs and exit immediately or we try to grok the logger option enough to see if it's a file and then watch that. If it's syslog, then "exercise to the user".

My suspicion is that users with strong feelings about logging who pass options are able to watch it themselves for errors? But, who knows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing loudly sounds fine to me. I'd say go with the former.

# order to support async startup, we introduced "tahoe daemonize"
# which does more-or-less what "tahoe start" used to. Now, "tahoe
# start" spawns "tahoe daemonize" and then determines whether
# tahoe has started successfully or hasn't (within 5 seconds).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this information belongs one layer up - something describing all of the process management options collectively, in relation to themselves, and maybe just focusing on modern behavior and not trying to document the history.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated docs/CLI.rst

@codecov-io
Copy link

codecov-io commented May 18, 2017

Codecov Report

Merging #417 into master will increase coverage by 0.19%.
The diff coverage is 72.2%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #417      +/-   ##
==========================================
+ Coverage   89.64%   89.83%   +0.19%     
==========================================
  Files         140      144       +4     
  Lines       26988    27084      +96     
  Branches     3883     3891       +8     
==========================================
+ Hits        24194    24332     +138     
+ Misses       2070     2021      -49     
- Partials      724      731       +7
Impacted Files Coverage Δ
src/allmydata/scripts/runner.py 83.03% <100%> (+1.95%) ⬆️
src/allmydata/scripts/tahoe_restart.py 35.71% <35.71%> (ø)
src/allmydata/scripts/tahoe_stop.py 39.34% <39.34%> (ø)
src/allmydata/scripts/tahoe_run.py 66.66% <66.66%> (ø)
src/allmydata/scripts/tahoe_daemonize.py 85.41% <85.41%> (ø)
src/allmydata/scripts/tahoe_start.py 87.01% <87.01%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17a734d...263a3f4. Read the comment docs.

@meejah
Copy link
Contributor Author

meejah commented Jun 7, 2017

@exarkun I think I've addressed All The Things from your review. Obviously there's still some tests to write, though.

@pataquets
Copy link
Contributor

What implications does this change has for Docker containers? My concerns for correct Docker execution are:

  • Should not fork/background/daemonize, but remain in foreground
  • Should print err/log/output messages to stderr/stdout

@meejah
Copy link
Contributor Author

meejah commented Jun 21, 2017

@pataquets For Docker you should use tahoe run.

From the perspective of this PR, I don't think the implications for Docker change at all: you shouldn't have used tahoe start before (because it daemonizes) and that's still true. tahoe daemonize is a new command, but also shouldn't be used by Docker.

Ultimately, we'd like to deprecate everything except tahoe run because:

  • there are already daemonization tools (sysmted, daemontools, supervisord, etc)
  • tahoe run is the only thing that works consistently on all platforms (including windows)

For context, this PR is a first-step in refactoring a bunch of internal Tahoe startup APIs to be more async-friendly (that is, factory functions instead of __init__ methods, essentially). Besides being good for the code in general, this will then allow us to bring in the Lease Database code in a better-factored state and ultimately the "cloud backend" branch.

@meejah
Copy link
Contributor Author

meejah commented Jun 21, 2017

(But all that said, if there's something in here that makes Tahoe harder to run under Docker I'd be glad to know!)

@pataquets
Copy link
Contributor

Understood. I'm not very deep in Python and just wanted to make sure, since I can't understand the PR myself.
Thank you very much for the prompt response and explanation, @meejah.

@meejah meejah force-pushed the daemonize-tahoe-start branch 2 times, most recently from 48002df to 6b22b77 Compare August 1, 2017 21:58
exarkun
exarkun previously requested changes Aug 7, 2017
"``tahoe create-node [NODEDIR]``" is the basic make-a-new-node
command. It creates a new directory and populates it with files that
will allow the "``tahoe start``" and related commands to use it later
on. `tahoe create-node` creates nodes that have client functionality
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the convention that dictates "``tahoe ...``" in one place and `tahoe ...` in another.



class StartOptions(BasedirOptions):
def identify_node_type(basedir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is the function that the tahoe invite branch wants so it can figure out if it's on an introducer node or not.

opts.getSynopsis()
opts.getUsage()

def test_daemonize_defaults(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails on travis and on my local system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the runner.dispatch() is allowing this to run too far, and it looks at ~/.tahoe (which isn't a real node directory on most systems, so the command bails with "... doesn't look like a directory at all").

I'd start by adding something to the assertion to make this easier to spot:

    self.assertEqual(0, exit_code[0],
                     [exit_code[0], o.getvalue(), e.getvalue()])

and then I think we want two separate steps. The first should build a config from the command with no arguments, and it should look at the resulting config and check that the nodedir is pointing at ~/.tahoe (or whatever the platform-specific value is.. there's a function in scripts/ somewhere that returns that value). Then we should modify the config object to point at a tempdir, and then allow the second part to run (runner.dispatch), and make sure it behaves sensibly.

Or, maybe we should mock out that what-is-the-default-nodedir function, and have it return the tempdir instead?

(what's this testing, exactly? is it that we can run tahoe daemonize on a plausibly-valid nodedir without complaint? Are you patching twistd to catch it at the last moment before it really spawns the daemon?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, mock.patch("allmydata.scripts.tahoe_daemonize._default_nodedir", self.mktemp()) ought to do something useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, make that:

tmpdir = self.mktemp()
base = dirname(tmpdir).decode(getfilesystemencoding())
with patch('allmydata.scripts.tahoe_daemonize.twistd'):
    with patch('allmydata.scripts.common.BasedirOptions.default_nodedir',
               base):
        config = runner.parse_or_exit_with_explanation([
               'daemonize',
           ])

(the BasedirOptions class stashes the default_nodedir value early, and it
also requires the value to be unicode)

self.nodetype = nodetype
self.basedir = basedir
def makeService(self, so):

def startService(self):
# delay this import as late as possible, to allow twistd's code to
# accept --reactor= selection. N.B.: this can't actually work until
# this file, and all the __init__.py files above it, also respect the
# prohibition on importing anything that transitively imports
# twisted.internet.reactor . That will take a lot of work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no longer clear what this comment refers to. Possibly it wasn't terribly clear before this change, either. I would have guessed it's about the twisted.internet.reactor import. It could possibly be about the allmydata.client.Client / allmydata.introducer.server.IntroducerNode / allmydata.stats.StatsGathererService imports.

try:
service_factory = node_to_instance[self.nodetype]
except KeyError:
raise ValueError("unknown nodetype %s" % self.nodetype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... % (self.nodetype,))

Or maybe "unknown nodetype {}".format(self.nodetype)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah .. I'd prefer .format too, but I'm also trying to be consistent -- as in, a codebase-wide "switch from % to .format" PR would be better, IMO?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to think not ... Would you want to review that patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: mmm, probably annoying.

But I like this kind of PR when there are code-base-wide problems, because when you're rebasing/merging any older branches it's way easier to know what the right way to resolve the conflicts is ("change everything to .format(), okay").

@meejah meejah force-pushed the daemonize-tahoe-start branch 7 times, most recently from a72571c to 14dbe9e Compare August 16, 2017 21:21
@meejah meejah dismissed exarkun’s stale review August 16, 2017 21:30

I believe I've covered everything from the review (one followup PR required for the identify_node_type thing)

@meejah meejah force-pushed the daemonize-tahoe-start branch 2 times, most recently from 8966e4d to 1c1c73d Compare August 16, 2017 21:44
@meejah
Copy link
Contributor Author

meejah commented Aug 17, 2017

Yay windows! exceptions.WindowsError: [Error 206] The filename or extension is too long: 'allmydata.test.cli.test_invite\\Invite\\test_invite_wrong_server_abiliti\\iqbap1\\temp\\servers\\xgru5adv\\storage\\shares\\incoming'

@meejah meejah force-pushed the daemonize-tahoe-start branch 2 times, most recently from 6c735d6 to a1e3734 Compare September 5, 2017 21:02
@meejah
Copy link
Contributor Author

meejah commented Sep 5, 2017

Anyone have any bright ideas for the windows stuff? It is sad that pathnames are too long

@meejah meejah mentioned this pull request Sep 11, 2017
3 tasks
@meejah meejah force-pushed the daemonize-tahoe-start branch 3 times, most recently from 1f1b8f7 to d3397f5 Compare September 12, 2017 22:48
This sets the stage for further changes to the startup
process so that "async things" are done before we create
the Client instance while still reporting early failures
to the shell where "tahoe start" is running

Also adds a bunch of test-coverage for the things that got
moved around, even though they didn't have coverage before
@warner
Copy link
Member

warner commented Sep 19, 2017

this looks good, meejah and I are landing it now

@warner warner merged commit 263a3f4 into tahoe-lafs:master Sep 19, 2017
@meejah meejah deleted the daemonize-tahoe-start branch September 19, 2017 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants