-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split up startstop_node and add 'tahoe daemonize' #417
Conversation
Hmm, it seems at least the unit-tests that start introducer etc don't work with the new code (i.e. they're not seeing the startup message in the logs)? Can someone with MacOS test this branch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Some comments inline. Mostly, but not entirely, documentation related or non-actionable. I also tried out the code and managed to start and stop a tahoe node with start
, stop
, run
, and daemonize
. Only a couple minor hiccups (also explained inline).
@@ -41,7 +53,13 @@ class Options(usage.Options): | |||
+ stats_gatherer.subCommands | |||
+ admin.subCommands | |||
+ GROUP("Controlling a node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. Weird. :/ Maybe there should be an upstream feature request for subcommand groups?
@@ -29,6 +30,17 @@ def GROUP(s): | |||
if _default_nodedir: | |||
NODEDIR_HELP += " [default for most commands: " + quote_local_unicode_path(_default_nodedir) + "]" | |||
|
|||
|
|||
# XXX all this 'dispatch' stuff needs to be unified + fixed up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True statement.
def identify_node_type(basedir): | ||
for fn in listdir_unicode(basedir): | ||
if fn.endswith(u".tac"): | ||
tac = str(fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is old code just being moved so I guess it shouldn't change ... but if I were going to change it, I would drop the str(fn)
(which is probably always the wrong thing to do to a unicode string) so that tac
remains unicode
. Then I'd made the tuple being iterated over below contain unicode
strings as well.
The application code that I can see that uses identify_node_type
(at least, that's included in the diff github is showing me right now) expects the return value to be text-ish (for example, it formats it into a string for a person to read). So the old return type of bytes is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree this should almost certainly be unicode. Most of its use is really "in code" in basically switch statements. I didn't want to muck about with existing code "too much" but could be easily convinced to do more "mucking" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to unicode.
from allmydata.stats import StatsGathererService | ||
srv = StatsGathererService(verbose=True) | ||
else: | ||
raise ValueError("unknown nodetype %s" % self.nodetype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old code just being re-indented... So probably doesn't make sense to change it much. But:
- codecov reports it as uncovered - not sure if that's true or not (multiprocess shenanigans?)
- clearly should be doing some more data-y dispatch. Most naively:
d = {
"client": lambda: namedAny("allmydata.client.Client")(self.basedir),
"introducer": lambda: namedAny("allmydata.introducer.server.IntroducerNode")(self.basedir),
...
}
if self.nodetype in d:
srv = d[self.nodetype]()
else:
raise ValueError(...)
srv = StatsGathererService(verbose=True) | ||
else: | ||
raise ValueError("unknown nodetype %s" % self.nodetype) | ||
print("SRV {}".format(srv)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray debug print?
|
||
def start(config): | ||
class DaemonizeTahoeNodePlugin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this brand new class be new-style instead of classic? And perhaps have a class docstring.
@@ -166,102 +174,9 @@ def start(config): | |||
else: | |||
verb = "starting" | |||
|
|||
runner = twistd._SomeApplicationRunner(twistd_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sad about most of the bits being private in Twisted. I suspect this in particular may cause some trouble sooner rather than later (to the extent anything in Twisted happens "sooner") due to the twistd / twist situation and the apparent desire to move things over to twist. OTOH, perhaps that actually means all of the twistd implementation will remain the same until it is deprecated and removed, which wouldn't be so bad for this...
codecov also says this code is uncovered, though. If that's really the case, adding some test coverage here seems particularly important.
On the gripping hand, all runApp
did was make a runner and then call run on it... Which is all this code appears to be doing now, albeit with an intervening print
. Can the print just happen before a runApp
call and avoid the need to touch this private API?
src/allmydata/scripts/runner.py
Outdated
+ [ | ||
["daemonize", None, tahoe_daemonize.DaemonizeOptions, "run a node disconnected from terminal"], | ||
["start", None, tahoe_start.StartOptions, "start a node"], | ||
["run", None, tahoe_run.RunOptions, "run a node"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not terribly clear what the difference between "start" and "run" is here. I think the old text included the word "synchronously" which I don't think added a whole lot. I think "run" is "run in the foreground", "start" is "old-style run daemonized, maybe deprecated soon", and "daemonize" is "new-style run daemonized"?
src/allmydata/scripts/tahoe_start.py
Outdated
print("Logs are available in '{}'".format(log_fname)) | ||
print("Collected for this run:") | ||
print(collected) | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some additional logic may be needed here. For example, tahoe start /path/... --help
now displays twistd's help and then this warning about something having gone wrong.
Maybe --help
is the only case not handled by the CalledProcessError
case above? It seems like most mis-uses of twistd options cause it to exit with a non-zero code, following the exception path above.
I could also imagine the --logfile
and/or --logger
options causing problems here. Not sure how much effort it is worth going to to make sure every obscure case works exactly as it did before... Maybe better to quickly deprecate this particular invocation and move on. That's the plan, right? Get everyone to use daemonize
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that looks like an edge case just when you provide both a node-directory and --help
(i.e. just tahoe start --help
does the right thing it seems).
Yes, --logfile
does cause problems -- because then we're watching the wrong logs. Hmm :/
The idea is that people who really want daemonization should use tahoe daemonize
and that most users should use tahoe run
along with their favourite "thing that runs daemons". So yes: deprecated tahoe start
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two ways to deal with the --logger
option: if any logger options are provided, we say we can't watch the logs and exit immediately or we try to grok the logger option enough to see if it's a file and then watch that. If it's syslog, then "exercise to the user".
My suspicion is that users with strong feelings about logging who pass options are able to watch it themselves for errors? But, who knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing loudly sounds fine to me. I'd say go with the former.
src/allmydata/scripts/tahoe_start.py
Outdated
# order to support async startup, we introduced "tahoe daemonize" | ||
# which does more-or-less what "tahoe start" used to. Now, "tahoe | ||
# start" spawns "tahoe daemonize" and then determines whether | ||
# tahoe has started successfully or hasn't (within 5 seconds). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this information belongs one layer up - something describing all of the process management options collectively, in relation to themselves, and maybe just focusing on modern behavior and not trying to document the history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated docs/CLI.rst
Codecov Report
@@ Coverage Diff @@
## master #417 +/- ##
==========================================
+ Coverage 89.64% 89.83% +0.19%
==========================================
Files 140 144 +4
Lines 26988 27084 +96
Branches 3883 3891 +8
==========================================
+ Hits 24194 24332 +138
+ Misses 2070 2021 -49
- Partials 724 731 +7
Continue to review full report at Codecov.
|
5a66ef4
to
58f953d
Compare
@exarkun I think I've addressed All The Things from your review. Obviously there's still some tests to write, though. |
What implications does this change has for Docker containers? My concerns for correct Docker execution are:
|
@pataquets For Docker you should use From the perspective of this PR, I don't think the implications for Docker change at all: you shouldn't have used Ultimately, we'd like to deprecate everything except
For context, this PR is a first-step in refactoring a bunch of internal Tahoe startup APIs to be more async-friendly (that is, factory functions instead of |
(But all that said, if there's something in here that makes Tahoe harder to run under Docker I'd be glad to know!) |
Understood. I'm not very deep in Python and just wanted to make sure, since I can't understand the PR myself. |
48002df
to
6b22b77
Compare
docs/frontends/CLI.rst
Outdated
"``tahoe create-node [NODEDIR]``" is the basic make-a-new-node | ||
command. It creates a new directory and populates it with files that | ||
will allow the "``tahoe start``" and related commands to use it later | ||
on. `tahoe create-node` creates nodes that have client functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the convention that dictates "``tahoe ...``"
in one place and `tahoe ...`
in another.
|
||
|
||
class StartOptions(BasedirOptions): | ||
def identify_node_type(basedir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is the function that the tahoe invite
branch wants so it can figure out if it's on an introducer node or not.
opts.getSynopsis() | ||
opts.getUsage() | ||
|
||
def test_daemonize_defaults(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails on travis and on my local system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the runner.dispatch()
is allowing this to run too far, and it looks at ~/.tahoe
(which isn't a real node directory on most systems, so the command bails with "... doesn't look like a directory at all").
I'd start by adding something to the assertion to make this easier to spot:
self.assertEqual(0, exit_code[0],
[exit_code[0], o.getvalue(), e.getvalue()])
and then I think we want two separate steps. The first should build a config from the command with no arguments, and it should look at the resulting config and check that the nodedir is pointing at ~/.tahoe
(or whatever the platform-specific value is.. there's a function in scripts/ somewhere that returns that value). Then we should modify the config object to point at a tempdir, and then allow the second part to run (runner.dispatch
), and make sure it behaves sensibly.
Or, maybe we should mock out that what-is-the-default-nodedir function, and have it return the tempdir instead?
(what's this testing, exactly? is it that we can run tahoe daemonize
on a plausibly-valid nodedir without complaint? Are you patching twistd
to catch it at the last moment before it really spawns the daemon?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, mock.patch("allmydata.scripts.tahoe_daemonize._default_nodedir", self.mktemp())
ought to do something useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, make that:
tmpdir = self.mktemp()
base = dirname(tmpdir).decode(getfilesystemencoding())
with patch('allmydata.scripts.tahoe_daemonize.twistd'):
with patch('allmydata.scripts.common.BasedirOptions.default_nodedir',
base):
config = runner.parse_or_exit_with_explanation([
'daemonize',
])
(the BasedirOptions class stashes the default_nodedir
value early, and it
also requires the value to be unicode)
self.nodetype = nodetype | ||
self.basedir = basedir | ||
def makeService(self, so): | ||
|
||
def startService(self): | ||
# delay this import as late as possible, to allow twistd's code to | ||
# accept --reactor= selection. N.B.: this can't actually work until | ||
# this file, and all the __init__.py files above it, also respect the | ||
# prohibition on importing anything that transitively imports | ||
# twisted.internet.reactor . That will take a lot of work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no longer clear what this comment refers to. Possibly it wasn't terribly clear before this change, either. I would have guessed it's about the twisted.internet.reactor
import. It could possibly be about the allmydata.client.Client
/ allmydata.introducer.server.IntroducerNode
/ allmydata.stats.StatsGathererService
imports.
try: | ||
service_factory = node_to_instance[self.nodetype] | ||
except KeyError: | ||
raise ValueError("unknown nodetype %s" % self.nodetype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... % (self.nodetype,))
Or maybe "unknown nodetype {}".format(self.nodetype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah .. I'd prefer .format
too, but I'm also trying to be consistent -- as in, a codebase-wide "switch from % to .format
" PR would be better, IMO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to think not ... Would you want to review that patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review: mmm, probably annoying.
But I like this kind of PR when there are code-base-wide problems, because when you're rebasing/merging any older branches it's way easier to know what the right way to resolve the conflicts is ("change everything to .format(), okay").
a72571c
to
14dbe9e
Compare
I believe I've covered everything from the review (one followup PR required for the identify_node_type thing)
8966e4d
to
1c1c73d
Compare
Yay windows! |
6c735d6
to
a1e3734
Compare
Anyone have any bright ideas for the windows stuff? It is sad that pathnames are too long |
1f1b8f7
to
d3397f5
Compare
This sets the stage for further changes to the startup process so that "async things" are done before we create the Client instance while still reporting early failures to the shell where "tahoe start" is running Also adds a bunch of test-coverage for the things that got moved around, even though they didn't have coverage before
d3397f5
to
263a3f4
Compare
this looks good, meejah and I are landing it now |
This sets the stage for further changes to the startup process so that "async things" are done before we create the Client instance while still reporting early failures to the shell where "tahoe start" is running
"tahoe daemonize" does what "tahoe start" did before. The new "tahoe start" shells out to "tahoe daemonize" under the hood and then monitors the logs for expected start-up messages -- when it informs the user and exits.