Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes TimedPhase auto-closeable. #143

Merged
merged 4 commits into from
Mar 1, 2017
Merged

Conversation

archolewa
Copy link
Contributor

--This would allow us to wrap timed phases in a try-with-resources
block:

   try(TimedPhase phase = RequestLog.startTiming("timerAllTheThings") {
                  doAllTheThings();
    }

--This has several advantages over the current approach:
1. We always want to wrap the code that stops a timer in a finally block.
Otherwise, the timer won't be stopped if we have an
exception. try-with-resources makes that clean.
2. It reduces the chances of typos when starting and stopping
timers. It also reduces the chances of people staring a timer
and then forgetting to stop it.
3. It is more concise.

--This has several limitations:
1. The minimum effort code to implement it is awkward. We'll
need to do a much more significant (and not yet clear) refactor
of RequestLog to handle this cleanly.
2. Timers that cross threads can't be handled this way. Those
still need to be stopped and started manually.

--Consequences that are more mixed:
1. It exposes a lot more of the inner workings of RequestLog.
This could make it easier to customers to perform their own
custom timings, but it also exposes code that is very easy
to use incorrectly if you don't have a strong understanding
of it.

@archolewa
Copy link
Contributor Author

There are a couple of changes here:

  1. I moved TimedPhase out into its own class, and made it beanlike. This isn't strictly necessary, but makes it a little bit cleaner IMO since it can now show up in the code independent of the RequestLog.

  2. Stopping a timer and storing its duration in the RequestLog have been split into two steps:

    1. Stop the timer.
    2. Store the timer's final time in the RequestLog's registery.

This was necessary because TimedPhase no longer has insight into the guts of RequestLog.

Both of these steps have been encapsulated in TimedPhase::close, which is automatically invoked at the end of a try-with-resources block with the finally semantics.

@michael-mclawhorn
Copy link
Contributor

I like this approach a lot

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small things.

Also, ChangeLog needed.

@@ -0,0 +1,88 @@
// Copyright 2016 Yahoo Inc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong year

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fist shake JANUARRYY!!!!

* Stop the phase.
*/
public void stop() {
if (start == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to convert to use the isStarted method

* Start the phase.
*/
public void start() {
if (start != 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to convert to use the isStarted method

@@ -275,6 +231,7 @@ public static void startTiming(String timePhaseName) {
}
current.mostRecentTimer = timePhase;
timePhase.start();
return timePhase;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make TimedPhase::start return itself?

return;
}
duration += System.nanoTime() - start;
REGISTRY.timer(name).update(duration, TimeUnit.NANOSECONDS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is getting lost. It may very well be that TimedPhase::stop needs to do the registration that TimedPhase::close is doing? Otherwise, I think there's a gap where things could be lost.

Or, to put it a different way, stop did the registering before, and now only close does it. Since stop was private, I don't know how much impact it will really have, but it's worth noting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your concern about registering "getting lost." It's now invoked as part of the TimedPhase::close, which is being called by RequestLog::stopTiming. So the behavior of the public interface hasn't changed. So this change will only affect people if they are doing Unsupported Things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I figured out what the problem was that I was worried about: Since stop is a public method, it's not clear what the difference between stop and close is, so there's a possibility that someone will call stop, but not close, which would result in not recording that chunk of time in the RequestLog.

I see the value in having stop be public, and separate from close, since it allows for accruing time in chunks, but only recording it at the end, but I think some more documentation is needed on stop (and possibly on close) to indicate that it's just stopping the timer, and not registering it with RequestLog.

It also seems like there may be a mismatch in responsibilities, because if someone wanted to do something like:

start -> stop -> start -> stop -> close

they would get a warning logged, since close tries to stop. Perhaps close just needs to check !isStarted() before calling stop()?

@@ -308,15 +277,15 @@ public static void stopTiming(String timePhaseName) {
LOG.warn("Tried to stop non-existent phase: {}", timePhaseName);
return;
}
timePhase.stop();
timePhase.close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make "stop" convert to doing the registering, calling "close" is likely not needed, but "stop" will be.

Copy link
Contributor Author

@archolewa archolewa Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close has to do the registering anyway, because that's the method invoked when the try-with-resources block ends. So even if we change how stop works, and had close delegate to stop, we still wouldn't need to change this invocation.

Copy link
Contributor Author

@archolewa archolewa Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus we shouldn't change this from close to stop. In my opinion, RequestLog::stopTiming should have the same semantics as what the try-with-resources block does (with the exception of the existence check that the try-with-resources doesn't have to worry about).

@archolewa
Copy link
Contributor Author

Since there seems to be strong support for this change, I'm going to flesh it out into a proper PR tomorrow (updating timers that could benefit from it, adding a CHANGELOG, etc).

I'm also going to squash the commits that are currently here down into logical commits to facilitate a proper review.

@michael-mclawhorn
Copy link
Contributor

👍


/**
* Constructor.
* Times are in nanoseconds.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? There is nothing time-related in the constructor. Should this be moved up to the class-level doc?

return;
}
duration += System.nanoTime() - start;
REGISTRY.timer(name).update(duration, TimeUnit.NANOSECONDS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I figured out what the problem was that I was worried about: Since stop is a public method, it's not clear what the difference between stop and close is, so there's a possibility that someone will call stop, but not close, which would result in not recording that chunk of time in the RequestLog.

I see the value in having stop be public, and separate from close, since it allows for accruing time in chunks, but only recording it at the end, but I think some more documentation is needed on stop (and possibly on close) to indicate that it's just stopping the timer, and not registering it with RequestLog.

It also seems like there may be a mismatch in responsibilities, because if someone wanted to do something like:

start -> stop -> start -> stop -> close

they would get a warning logged, since close tries to stop. Perhaps close just needs to check !isStarted() before calling stop()?

return TimeUnit.NANOSECONDS;
}

public boolean isStarted() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may seem stupid, but should we change the name of this method to isRunning? We seem to be using it to indicate if this TimedPhase is currently running or not, and isStarted feels more like a "once I've been started, I've always been started" sort of thing, where as running / stopped feels more like a toggleable state.

Obviously this is a small thing, but every time I see isStarted, I noticed I had to mentally pause and convert to isRunning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

public long getDuration() {
return duration;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two places in RequestLog where the duration is retrieved, and those places are just after an isStarted check. Could we move that isStarted check / warning log into this getter so that the RequestLog doesn't have to worry about it?


private final String name;
private long start;
private long duration;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think start and duration need to be made thread-safe (likely with AtomicLongs)... It wasn't an issue before, because TimedPhase was encapsulated within the RequestLog, which was a ThreadLocal and was never able to exist on more than 1 thread at a time, but TimedPhase is now it's own thing, and there's a possibility of sharing them across threads.

I tried playing around with using AtomicLong for these variables a little bit, and it's not hard to retain the existing semantics, though it did make me think about if we wanted to maintain the existing semantics around multiple consecutive starts / stops, especially in the new world of being across threads. No good solutions to this came to mind immediately, so it may make sense to add a cautionary note to the class-level doc for TimedPhase indicating the possible race condition of starting and stopping the same phase on multiple threads.

Anyway, here's the relevant results of my exploration, in case they help:

private static final long NOT_RUNNING = 0;

private final String name;
private final AtomicLong start = new AtomicLong();
private final AtomicLong duration = new AtomicLong();

/**
 * Start the phase.
 *
 * @return This phase after being started
 */
public TimedPhase start() {
    if (!start.compareAndSet(NOT_RUNNING, System.nanoTime())) {
        LOG.warn("Tried to start timer that is already running: {}", name);
    }
    return this;
}

/**
 * Stop the phase.
 */
public void stop() {
    long started = start.getAndUpdate(ignoredCurrent -> NOT_RUNNING);
    if (started == 0) {
        LOG.warn("Tried to stop timer that has not been started: {}", name);
        return;
    }
    duration.addAndGet(System.nanoTime() - started);
}

public long getDuration() {
    return duration.get();
}

public boolean isStarted() {
    return start.get() != 0;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with making it thread safe, especially since many of these timers are stopped on a different thread than they're started (the request vs response thread). Rethinking the semantics is outside the scope of this PR though, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after playing with it a little bit, I think I'm going to punt on this, largely because validating whether a timer is running outside of this class or not becomes a lot trickier in a multi-threaded scenario.

@cdeszaq
Copy link
Collaborator

cdeszaq commented Mar 1, 2017

A few small things, and medium thing (the warning logger), and a large thing (the thread safety). I'm OK punting on the TimedPhase thread safety (I think) as long as we clearly note that it's not a thread-safe class if both classes are starting / stopping. It would be great to close that gap, but I'm happy to take the clarity offered by it being closable, and coming back to making TimedPhase thread-safe. (A techdebt ticket might be a good idea)

Other than the thread safety issue, if everything else is updated, this is good to merge.

@archolewa archolewa force-pushed the timing-phase-auto-closeable branch from 067af4d to ed75539 Compare March 1, 2017 16:55
@archolewa
Copy link
Contributor Author

Ticket to revisit making timers thread-safe: #185

@archolewa archolewa force-pushed the timing-phase-auto-closeable branch from ed75539 to d1daab8 Compare March 1, 2017 17:12
--This would allow us to wrap timed phases in a try-with-resources
block:

```java
   try(TimedPhase phase = RequestLog.startTiming("timerAllTheThings") {
                  doAllTheThings();
    }
```

--This has several advantages over the current approach:
        1. We always want to wrap the code that stops a timer in a finally block.
            Otherwise, the timer won't be stopped if we have an
            exception. try-with-resources makes that clean.
        2. It reduces the chances of typos when starting and stopping
            timers. It also reduces the chances of people staring a timer
            and then forgetting to stop it.
        3. It is more concise.

--This has several limitations:
        1. The minimum effort code to implement it is awkward. We'll
            need to do a much more significant (and not yet clear) refactor
            of RequestLog to handle this cleanly.
        2. Timers that cross threads can't be handled this way. Those
            still need to be stopped and started manually.

--Consequences that are more mixed:
        1. It exposes a lot more of the inner workings of RequestLog.
            This could make it easier to customers to perform their own
            custom timings, but it also exposes code that is very easy
            to use incorrectly if you don't have a strong understanding
            of it.
--Those timers that are started and stopped in the same context now use
try-with-resources blocks that start the timer.
@archolewa archolewa force-pushed the timing-phase-auto-closeable branch from d1daab8 to cf67dc0 Compare March 1, 2017 17:24
@archolewa archolewa merged commit 25057d1 into master Mar 1, 2017
@archolewa archolewa deleted the timing-phase-auto-closeable branch March 1, 2017 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants