New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds optional waiting to cs tail #854

Merged
merged 12 commits into from May 27, 2018

Conversation

Projects
None yet
3 participants
@dposada
Copy link
Member

dposada commented May 15, 2018

Changes proposed in this PR

  • adding --wait flag to cs tail

Why are we making these changes?

To allow waiting until the job is running and the file being tailed is available when running, for example, cs tail -f ... on a pending job.

@dposada dposada added cli wip labels May 15, 2018

@dposada dposada self-assigned this May 15, 2018

@dposada dposada closed this May 16, 2018

@dposada dposada reopened this May 16, 2018

@dposada dposada changed the title Adds optional retrying for cs tail Adds optional waiting for cs tail May 16, 2018

@dposada dposada changed the title Adds optional waiting for cs tail Adds optional waiting to cs tail May 16, 2018

dposada added some commits May 16, 2018

@dposada dposada closed this May 17, 2018

@dposada dposada reopened this May 17, 2018

@dposada dposada removed the wip label May 17, 2018

@dposada dposada requested review from DaoWen and sradack May 17, 2018


if wait_seconds == -1:
wait_seconds = None
elif not wait_seconds:

This comment has been minimized.

@DaoWen

DaoWen May 17, 2018

Contributor

In Python, not 0.0 is True. I don't think that leads to the behavior we want here.

This comment has been minimized.

@dposada

dposada May 18, 2018

Member

This is now removed

@@ -124,14 +126,20 @@ def tail(clusters, args, _):
lines = args.get('lines')
follow = args.get('follow')
sleep_interval = args.get('sleep-interval')
wait_seconds = args.get('wait')

if wait_seconds == -1:

This comment has been minimized.

@DaoWen

DaoWen May 17, 2018

Contributor

I think we should change this to if wait_seconds <= 0.

This comment has been minimized.

@dposada

dposada May 18, 2018

Member

This is now removed

@@ -143,6 +151,9 @@ def register(add_parser, add_defaults):
parser.add_argument('--sleep-interval', '-s',
help=f'with -f, sleep for N seconds (default {DEFAULT_FOLLOW_SLEEP_SECS}) between iterations',
metavar='N', type=float)
parser.add_argument('--wait', '-w',
help=f'wait for up to N seconds (default is to wait indefinitely)',
metavar='N', type=float, nargs='?', default=-1)

This comment has been minimized.

@DaoWen

DaoWen May 17, 2018

Contributor

It looks like either the help string or the default is wrong here.

I think the default should be float('inf') (i.e., wait indefinitely—which matches the current help string description), and the user can do -w0 if they don't want to wait.

This comment has been minimized.

@DaoWen

DaoWen May 17, 2018

Contributor

FYI, it doesn't look like there'd be any issue with passing float('inf') into the tenacity retry annotation: https://github.com/jd/tenacity/blob/a86237f4d0dee13d328e819d7d9bf038c8bb0a77/tenacity/stop.py#L90-L97

This comment has been minimized.

@dposada

dposada May 18, 2018

Member

As discussed, I removed the ability to specify an amount of time to wait for. So now, by default, we don't wait at all (current behavior), and with --wait, we always wait indefinitely

@@ -143,6 +151,9 @@ def register(add_parser, add_defaults):
parser.add_argument('--sleep-interval', '-s',
help=f'with -f, sleep for N seconds (default {DEFAULT_FOLLOW_SLEEP_SECS}) between iterations',
metavar='N', type=float)
parser.add_argument('--wait', '-w',
help=f'wait for up to N seconds (default is to wait indefinitely)',

This comment has been minimized.

@DaoWen

DaoWen May 17, 2018

Contributor

This doesn't need to be an f-string.

This comment has been minimized.

@dposada

dposada May 18, 2018

Member

Done, thanks

if wait_seconds == -1:
wait_seconds = None
elif not wait_seconds:
wait_seconds = sys.maxsize

This comment has been minimized.

@DaoWen

DaoWen May 17, 2018

Contributor

I think we should change the default, then we can get rid of this branch altogether. See below.

This comment has been minimized.

@dposada

dposada May 18, 2018

Member

This is removed now

@@ -8,6 +8,8 @@
from operator import itemgetter
from urllib.parse import urlparse, parse_qs

import tenacity

This comment has been minimized.

@sradack

sradack May 20, 2018

Contributor

Are we worried that this might increase startup time?

This comment has been minimized.

@dposada

dposada May 21, 2018

Member

Changed it to a local import that only happens in the --wait case


if wait:
r = tenacity.Retrying(wait=tenacity.wait_fixed(5))
r.call(query_unique_and_run)

This comment has been minimized.

@sradack

sradack May 20, 2018

Contributor

Does this eventually give up? Should we consider retrying only for specific exceptions?

This comment has been minimized.

@dposada

dposada May 21, 2018

Member

This never gives up. I introduced CookRetriableException and switched to using it in the specific spots where we want retrying.

This comment has been minimized.

@DaoWen

DaoWen May 21, 2018

Contributor

I like this change. Now we won't get into an infinite retry loop when there's some unexpected error.

This comment has been minimized.

@sradack

sradack May 22, 2018

Contributor

We should eventually give up, especially since it's not clear to me that all exceptions marked retriable will eventually get resolved. We should ensure that the error message is also presented to the user after timeout.

This comment has been minimized.

@dposada

dposada May 22, 2018

Member

How about if we add a separate flag (e.g. --wait-for '5 minutes') for the eventually giving up case, and leave the --wait as an indefinite wait? It's definitely true that those exceptions may not get resolved. As a user, though, I may want this to block until they do get resolved. For example, I submit a job, and it may take a long time for it to start, and then once it starts it may take a long time for my file to appear, but I would prefer to just wait until those things are true without having to decide if I should wait 30 minutes or 90 minutes or longer.

This comment has been minimized.

@dposada

dposada May 23, 2018

Member

@sradack What are your thoughts on this?

This comment has been minimized.

@dposada

This comment has been minimized.

@dposada

dposada May 26, 2018

Member

@sradack I added a 1-day timeout on the --wait, and it re-raises the last exception

@DaoWen

DaoWen approved these changes May 21, 2018

@sradack sradack merged commit ca4bcb8 into twosigma:master May 27, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@dposada dposada deleted the dposada:tail-retry branch May 27, 2018

dposada added a commit to dposada/Cook that referenced this pull request Nov 1, 2018

Adds optional waiting to cs tail (twosigma#854)
* Adds optional retrying for cs tail

* Renames 'retry' to 'wait'

* Changes flag to --wait

* Bumps version

* Makes test_manage_task_random_binary_output be xfail

* Removes the option to specify how long to wait for

* Removes f from string

* Removes superfluous import

* Introduces CookRetriableException

* Stops waiting after 1 day (and re-raises the last exception)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment