Split up startstop_node and add 'tahoe daemonize' #417

meejah · 2017-05-17T08:01:23Z

This sets the stage for further changes to the startup process so that "async things" are done before we create the Client instance while still reporting early failures to the shell where "tahoe start" is running

"tahoe daemonize" does what "tahoe start" did before. The new "tahoe start" shells out to "tahoe daemonize" under the hood and then monitors the logs for expected start-up messages -- when it informs the user and exits.

meejah · 2017-05-17T19:18:47Z

Hmm, it seems at least the unit-tests that start introducer etc don't work with the new code (i.e. they're not seeing the startup message in the logs)? Can someone with MacOS test this branch?

exarkun

Thanks. Some comments inline. Mostly, but not entirely, documentation related or non-actionable. I also tried out the code and managed to start and stop a tahoe node with start, stop, run, and daemonize. Only a couple minor hiccups (also explained inline).

exarkun · 2017-05-17T23:09:02Z

src/allmydata/scripts/runner.py

@@ -41,7 +53,13 @@ class Options(usage.Options):
                    +   stats_gatherer.subCommands
                    +   admin.subCommands
                    + GROUP("Controlling a node")


Huh. Weird. :/ Maybe there should be an upstream feature request for subcommand groups?

exarkun · 2017-05-17T23:10:55Z

src/allmydata/scripts/runner.py

@@ -29,6 +30,17 @@ def GROUP(s):
 if _default_nodedir:
    NODEDIR_HELP += " [default for most commands: " + quote_local_unicode_path(_default_nodedir) + "]"

+
+# XXX all this 'dispatch' stuff needs to be unified + fixed up


True statement.

exarkun · 2017-05-17T23:12:11Z

src/allmydata/scripts/tahoe_daemonize.py

+def identify_node_type(basedir):
+    for fn in listdir_unicode(basedir):
+        if fn.endswith(u".tac"):
+            tac = str(fn)


This is old code just being moved so I guess it shouldn't change ... but if I were going to change it, I would drop the str(fn) (which is probably always the wrong thing to do to a unicode string) so that tac remains unicode. Then I'd made the tuple being iterated over below contain unicode strings as well.

The application code that I can see that uses identify_node_type (at least, that's included in the diff github is showing me right now) expects the return value to be text-ish (for example, it formats it into a string for a person to read). So the old return type of bytes is wrong.

I do agree this should almost certainly be unicode. Most of its use is really "in code" in basically switch statements. I didn't want to muck about with existing code "too much" but could be easily convinced to do more "mucking" :)

changed to unicode.

exarkun · 2017-05-17T23:21:04Z

src/allmydata/scripts/tahoe_daemonize.py

+                from allmydata.stats import StatsGathererService
+                srv = StatsGathererService(verbose=True)
+            else:
+                raise ValueError("unknown nodetype %s" % self.nodetype)


Old code just being re-indented... So probably doesn't make sense to change it much. But:

codecov reports it as uncovered - not sure if that's true or not (multiprocess shenanigans?)

clearly should be doing some more data-y dispatch. Most naively:

d = { "client": lambda: namedAny("allmydata.client.Client")(self.basedir), "introducer": lambda: namedAny("allmydata.introducer.server.IntroducerNode")(self.basedir), ... } if self.nodetype in d: srv = d[self.nodetype]() else: raise ValueError(...)

exarkun · 2017-05-17T23:21:15Z

src/allmydata/scripts/tahoe_daemonize.py

+                srv = StatsGathererService(verbose=True)
+            else:
+                raise ValueError("unknown nodetype %s" % self.nodetype)
+            print("SRV {}".format(srv))


Stray debug print?

exarkun · 2017-05-18T00:59:57Z

src/allmydata/scripts/tahoe_daemonize.py


-def start(config):
+class DaemonizeTahoeNodePlugin:


Should this brand new class be new-style instead of classic? And perhaps have a class docstring.

exarkun · 2017-05-18T01:04:30Z

src/allmydata/scripts/tahoe_daemonize.py

@@ -166,102 +174,9 @@ def start(config):
    else:
        verb = "starting"

+    runner = twistd._SomeApplicationRunner(twistd_config)


Ah, sad about most of the bits being private in Twisted. I suspect this in particular may cause some trouble sooner rather than later (to the extent anything in Twisted happens "sooner") due to the twistd / twist situation and the apparent desire to move things over to twist. OTOH, perhaps that actually means all of the twistd implementation will remain the same until it is deprecated and removed, which wouldn't be so bad for this...

codecov also says this code is uncovered, though. If that's really the case, adding some test coverage here seems particularly important.

On the gripping hand, all runApp did was make a runner and then call run on it... Which is all this code appears to be doing now, albeit with an intervening print. Can the print just happen before a runApp call and avoid the need to touch this private API?

exarkun · 2017-05-18T01:11:39Z

src/allmydata/scripts/runner.py

+                    + [
+                        ["daemonize", None, tahoe_daemonize.DaemonizeOptions, "run a node disconnected from terminal"],
+                        ["start", None, tahoe_start.StartOptions, "start a node"],
+                        ["run", None, tahoe_run.RunOptions, "run a node"],


It's not terribly clear what the difference between "start" and "run" is here. I think the old text included the word "synchronously" which I don't think added a whole lot. I think "run" is "run in the foreground", "start" is "old-style run daemonized, maybe deprecated soon", and "daemonize" is "new-style run daemonized"?

exarkun · 2017-05-18T01:17:31Z

src/allmydata/scripts/tahoe_start.py

+        print("Logs are available in '{}'".format(log_fname))
+        print("Collected for this run:")
+        print(collected)
+        sys.exit(1)


I think some additional logic may be needed here. For example, tahoe start /path/... --help now displays twistd's help and then this warning about something having gone wrong.

Maybe --help is the only case not handled by the CalledProcessError case above? It seems like most mis-uses of twistd options cause it to exit with a non-zero code, following the exception path above.

I could also imagine the --logfile and/or --logger options causing problems here. Not sure how much effort it is worth going to to make sure every obscure case works exactly as it did before... Maybe better to quickly deprecate this particular invocation and move on. That's the plan, right? Get everyone to use daemonize instead?

Hmm, that looks like an edge case just when you provide both a node-directory and --help (i.e. just tahoe start --help does the right thing it seems).

Yes, --logfile does cause problems -- because then we're watching the wrong logs. Hmm :/

The idea is that people who really want daemonization should use tahoe daemonize and that most users should use tahoe run along with their favourite "thing that runs daemons". So yes: deprecated tahoe start...

I see two ways to deal with the --logger option: if any logger options are provided, we say we can't watch the logs and exit immediately or we try to grok the logger option enough to see if it's a file and then watch that. If it's syslog, then "exercise to the user".

My suspicion is that users with strong feelings about logging who pass options are able to watch it themselves for errors? But, who knows.

Failing loudly sounds fine to me. I'd say go with the former.

exarkun · 2017-05-18T01:26:34Z

src/allmydata/scripts/tahoe_start.py

+    # order to support async startup, we introduced "tahoe daemonize"
+    # which does more-or-less what "tahoe start" used to. Now, "tahoe
+    # start" spawns "tahoe daemonize" and then determines whether
+    # tahoe has started successfully or hasn't (within 5 seconds).


Maybe this information belongs one layer up - something describing all of the process management options collectively, in relation to themselves, and maybe just focusing on modern behavior and not trying to document the history.

updated docs/CLI.rst

codecov-io · 2017-05-18T23:15:38Z

Codecov Report

Merging #417 into master will increase coverage by 0.19%.
The diff coverage is 72.2%.

@@            Coverage Diff             @@
##           master     #417      +/-   ##
==========================================
+ Coverage   89.64%   89.83%   +0.19%     
==========================================
  Files         140      144       +4     
  Lines       26988    27084      +96     
  Branches     3883     3891       +8     
==========================================
+ Hits        24194    24332     +138     
+ Misses       2070     2021      -49     
- Partials      724      731       +7

Impacted Files	Coverage Δ
src/allmydata/scripts/runner.py	`83.03% <100%> (+1.95%)`	⬆️
src/allmydata/scripts/tahoe_restart.py	`35.71% <35.71%> (ø)`
src/allmydata/scripts/tahoe_stop.py	`39.34% <39.34%> (ø)`
src/allmydata/scripts/tahoe_run.py	`66.66% <66.66%> (ø)`
src/allmydata/scripts/tahoe_daemonize.py	`85.41% <85.41%> (ø)`
src/allmydata/scripts/tahoe_start.py	`87.01% <87.01%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17a734d...263a3f4. Read the comment docs.

meejah · 2017-06-07T22:10:52Z

@exarkun I think I've addressed All The Things from your review. Obviously there's still some tests to write, though.

pataquets · 2017-06-21T21:40:15Z

What implications does this change has for Docker containers? My concerns for correct Docker execution are:

Should not fork/background/daemonize, but remain in foreground
Should print err/log/output messages to stderr/stdout

meejah · 2017-06-21T21:57:02Z

@pataquets For Docker you should use tahoe run.

From the perspective of this PR, I don't think the implications for Docker change at all: you shouldn't have used tahoe start before (because it daemonizes) and that's still true. tahoe daemonize is a new command, but also shouldn't be used by Docker.

Ultimately, we'd like to deprecate everything except tahoe run because:

there are already daemonization tools (sysmted, daemontools, supervisord, etc)
tahoe run is the only thing that works consistently on all platforms (including windows)

For context, this PR is a first-step in refactoring a bunch of internal Tahoe startup APIs to be more async-friendly (that is, factory functions instead of __init__ methods, essentially). Besides being good for the code in general, this will then allow us to bring in the Lease Database code in a better-factored state and ultimately the "cloud backend" branch.

meejah · 2017-06-21T21:57:53Z

(But all that said, if there's something in here that makes Tahoe harder to run under Docker I'd be glad to know!)

pataquets · 2017-06-21T22:09:38Z

Understood. I'm not very deep in Python and just wanted to make sure, since I can't understand the PR myself.
Thank you very much for the prompt response and explanation, @meejah.

exarkun · 2017-08-07T15:01:44Z

docs/frontends/CLI.rst

+"``tahoe create-node [NODEDIR]``" is the basic make-a-new-node
+command. It creates a new directory and populates it with files that
+will allow the "``tahoe start``" and related commands to use it later
+on. `tahoe create-node` creates nodes that have client functionality


I don't quite understand the convention that dictates "``tahoe ...``" in one place and `tahoe ...` in another.

exarkun · 2017-08-07T17:26:22Z

src/allmydata/scripts/tahoe_daemonize.py



-class StartOptions(BasedirOptions):
+def identify_node_type(basedir):


Oh, this is the function that the tahoe invite branch wants so it can figure out if it's on an introducer node or not.

exarkun · 2017-08-07T17:29:09Z

src/allmydata/test/cli/test_daemonize.py

+        opts.getSynopsis()
+        opts.getUsage()
+
+    def test_daemonize_defaults(self):


This test fails on travis and on my local system.

It looks like the runner.dispatch() is allowing this to run too far, and it looks at ~/.tahoe (which isn't a real node directory on most systems, so the command bails with "... doesn't look like a directory at all").

I'd start by adding something to the assertion to make this easier to spot:

self.assertEqual(0, exit_code[0], [exit_code[0], o.getvalue(), e.getvalue()])

and then I think we want two separate steps. The first should build a config from the command with no arguments, and it should look at the resulting config and check that the nodedir is pointing at ~/.tahoe (or whatever the platform-specific value is.. there's a function in scripts/ somewhere that returns that value). Then we should modify the config object to point at a tempdir, and then allow the second part to run (runner.dispatch), and make sure it behaves sensibly.

Or, maybe we should mock out that what-is-the-default-nodedir function, and have it return the tempdir instead?

(what's this testing, exactly? is it that we can run tahoe daemonize on a plausibly-valid nodedir without complaint? Are you patching twistd to catch it at the last moment before it really spawns the daemon?)

BTW, mock.patch("allmydata.scripts.tahoe_daemonize._default_nodedir", self.mktemp()) ought to do something useful.

Sorry, make that:

tmpdir = self.mktemp() base = dirname(tmpdir).decode(getfilesystemencoding()) with patch('allmydata.scripts.tahoe_daemonize.twistd'): with patch('allmydata.scripts.common.BasedirOptions.default_nodedir', base): config = runner.parse_or_exit_with_explanation([ 'daemonize', ])

(the BasedirOptions class stashes the default_nodedir value early, and it
also requires the value to be unicode)

exarkun · 2017-08-07T17:38:16Z

src/allmydata/scripts/tahoe_daemonize.py

        self.nodetype = nodetype
        self.basedir = basedir
-    def makeService(self, so):
+
+    def startService(self):
        # delay this import as late as possible, to allow twistd's code to
        # accept --reactor= selection. N.B.: this can't actually work until
        # this file, and all the __init__.py files above it, also respect the
        # prohibition on importing anything that transitively imports
        # twisted.internet.reactor . That will take a lot of work.


It's no longer clear what this comment refers to. Possibly it wasn't terribly clear before this change, either. I would have guessed it's about the twisted.internet.reactor import. It could possibly be about the allmydata.client.Client / allmydata.introducer.server.IntroducerNode / allmydata.stats.StatsGathererService imports.

exarkun · 2017-08-07T17:39:07Z

src/allmydata/scripts/tahoe_daemonize.py

+            try:
+                service_factory = node_to_instance[self.nodetype]
+            except KeyError:
+                raise ValueError("unknown nodetype %s" % self.nodetype)


... % (self.nodetype,))

Or maybe "unknown nodetype {}".format(self.nodetype)

Yeah .. I'd prefer .format too, but I'm also trying to be consistent -- as in, a codebase-wide "switch from % to .format" PR would be better, IMO?

I tend to think not ... Would you want to review that patch?

Review: mmm, probably annoying.

But I like this kind of PR when there are code-base-wide problems, because when you're rebasing/merging any older branches it's way easier to know what the right way to resolve the conflicts is ("change everything to .format(), okay").

I believe I've covered everything from the review (one followup PR required for the identify_node_type thing)

meejah · 2017-08-17T17:13:15Z

Yay windows! exceptions.WindowsError: [Error 206] The filename or extension is too long: 'allmydata.test.cli.test_invite\\Invite\\test_invite_wrong_server_abiliti\\iqbap1\\temp\\servers\\xgru5adv\\storage\\shares\\incoming'

meejah · 2017-09-05T21:23:14Z

Anyone have any bright ideas for the windows stuff? It is sad that pathnames are too long

This sets the stage for further changes to the startup process so that "async things" are done before we create the Client instance while still reporting early failures to the shell where "tahoe start" is running Also adds a bunch of test-coverage for the things that got moved around, even though they didn't have coverage before

warner · 2017-09-19T17:20:35Z

this looks good, meejah and I are landing it now

exarkun requested changes May 18, 2017

View reviewed changes

meejah force-pushed the daemonize-tahoe-start branch from 5a66ef4 to 58f953d Compare June 6, 2017 17:42

meejah force-pushed the daemonize-tahoe-start branch 2 times, most recently from 48002df to 6b22b77 Compare August 1, 2017 21:58

exarkun previously requested changes Aug 7, 2017

View reviewed changes

meejah force-pushed the daemonize-tahoe-start branch 7 times, most recently from a72571c to 14dbe9e Compare August 16, 2017 21:21

meejah force-pushed the daemonize-tahoe-start branch 2 times, most recently from 8966e4d to 1c1c73d Compare August 16, 2017 21:44

meejah force-pushed the daemonize-tahoe-start branch 2 times, most recently from 6c735d6 to a1e3734 Compare September 5, 2017 21:02

meejah mentioned this pull request Sep 11, 2017

Async startup: pull out config #443

Merged

3 tasks

meejah force-pushed the daemonize-tahoe-start branch 3 times, most recently from 1f1b8f7 to d3397f5 Compare September 12, 2017 22:48

meejah added 2 commits September 19, 2017 10:39

stop chdir

263a3f4

meejah force-pushed the daemonize-tahoe-start branch from d3397f5 to 263a3f4 Compare September 19, 2017 16:39

warner merged commit 263a3f4 into tahoe-lafs:master Sep 19, 2017

meejah deleted the daemonize-tahoe-start branch September 19, 2017 17:46



		class StartOptions(BasedirOptions):
		def identify_node_type(basedir):

Split up startstop_node and add 'tahoe daemonize' #417

Split up startstop_node and add 'tahoe daemonize' #417

Conversation

meejah commented May 17, 2017

meejah commented May 17, 2017

exarkun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented May 18, 2017 • edited Loading

Codecov Report

meejah commented Jun 7, 2017

pataquets commented Jun 21, 2017

meejah commented Jun 21, 2017

meejah commented Jun 21, 2017

pataquets commented Jun 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meejah commented Aug 17, 2017

meejah commented Sep 5, 2017

warner commented Sep 19, 2017

codecov-io commented May 18, 2017 •

edited

Loading