Fixes some flaky asynchronous tests. #39

archolewa · 2016-09-15T19:21:32Z

--A test version of the AsynchronousWorkflowsBuilder is introduced. This
version extends the DefaultAsynchronousWorkflowsBuilder, and builds the
exact same workflow. However, it also provides a mechanism for outside
classes to add additional Observers to the workflows, allowing them to
do such things as add countdown latches to have thread-safe tests.

dayamr · 2016-09-15T21:15:59Z

Thank you for explaining Andrew. it looks good to me. 👍

cdeszaq · 2016-09-15T21:02:03Z

CHANGELOG.md

@@ -9,6 +9,10 @@ Current
 -------

 ### Added:
+-  [A test implementation of the `AsynchronousWorkflowsBuilder`, `TestAsynchronousWorkflowsBuilder`](http://github.com/yahoo/fili/pull/36)


PR 39, yes? (also other change log entry)

Heheh. Looks like I copied-pasted-forgot-to-modify.

cdeszaq · 2016-09-15T21:25:58Z

...rc/test/java/com/yahoo/bard/webservice/async/workflows/TestAsynchronousWorkflowsBuilder.java

+        workflowMap.put(Workflow.SYNCHRONOUS, workflows.getSynchronousPayload());
+        workflowMap.put(Workflow.ASYNCHRONOUS, workflows.getAsynchronousPayload());
+        workflowMap.put(Workflow.PRERESPONSE_READY, workflows.getPreResponseReadyNotifications());
+        workflowMap.put(Workflow.JOB_MARKED_COMPLETE, workflows.getJobMarkedCompleteNotifications());


it might be nice to decouple this from the specific workflow steps a bit. As it stands, if a workflow adds steps, there's no way to attach a subscriber to it. I'm not sure we can do it right now, since this is Java code, and to do it cleanly we'd want to use groovy's ability to call methods as strings, but it would be a nice-to-have capability. Likely more for "do later" rather than any time soon.

Yeah, I wasn't a particularly big fan of this, either. In Java, the best approach to handle this sort of thing may be to make the AsynchronousWorkflows object less rigid. Give it a map that maps interfaces to Observables, with an enum as default implementation of the interface that enumerates the current workflows, rather than having explicit fields and getters. Then, this class iterate over the map.

I consider that to be outside the scope of this PR, though.

Yeah, that is a good way of doing it in Java land. And yeah, it can happen later. No need to slow down this PR.

cdeszaq · 2016-09-15T22:00:24Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/async/AsyncDruidSendsErrorSpec.groovy

+    def setup() {
+        TestAsynchronousWorkflowsBuilder.addSubscriber(
+                TestAsynchronousWorkflowsBuilder.Workflow.JOB_MARKED_COMPLETE,
+                new Observer() {


This is ugly and clutters the test with more stuff than is needed to get the point across. We can do better (after all, this is Groovy):

TestAsynchronousWorkflowsBuilder.addSubscriber(TestAsynchronousWorkflowsBuilder.Workflow.JOB_MARKED_COMPLETE) { jobMetadataReady.countDown() } { throw it }

Or we can pass them as parameters if you don't like the implicit closure-passing style

TestAsynchronousWorkflowsBuilder.addSubscriber( TestAsynchronousWorkflowsBuilder.Workflow.JOB_MARKED_COMPLETE, {jobMetadataReady.countDown()}, {throw it} )

To achieve this, we need to add some more helpers to the TestAsynchronousWorkflowsBuilder:

/** * Adds the specified subscriber to the specified workflow. * * @param workflow The workflow to add the countdown latch to * @param workflowSubscriber The subscriber that should be added to the specified workflow */ public static void addSubscriber(Workflow workflow, Observer workflowSubscriber) { SUBSCRIBERS.put(workflow, workflowSubscriber); } /** * Adds the subscriber (specified in individual method components) to the specified workflow. * * @param workflow The workflow to add the countdown latch to * @param onNext onNext method for the observer * @param onCompleted onCompleted method for the observer * @param onError onError method for the observer */ public static void addSubscriber( Workflow workflow, Consumer<Object> onNext, Runnable onCompleted, Consumer<Throwable> onError ) { Observer workflowSubscriber = new Observer() { @Override public void onNext(Object next) { onNext.accept(next); } @Override public void onCompleted() { onCompleted.run(); } @Override public void onError(Throwable error) { onError.accept(error); } }; SUBSCRIBERS.put(workflow, workflowSubscriber); } /** * Subscribes the partially-specified observer to the specified workflow. * <p> * Uses a no-op onCompleted method. * * @param workflow The workflow to add the countdown latch to * @param onNext onNext method for the observer * @param onError onError method for the observer */ public static void addSubscriber(Workflow workflow, Consumer<Object> onNext, Consumer<Throwable> onError) { addSubscriber(workflow, onNext::accept, () -> { }, ignored -> { }); } /** * Subscribes the partially-specified observer to the specified workflow. * <p> * Uses no-op onCompleted and onError methods. * * @param workflow The workflow to add the countdown latch to * @param onNext onNext method for the observer */ public static void addSubscriber(Workflow workflow, Consumer<Object> onNext) { addSubscriber(workflow, onNext::accept, ignored -> { }); }

Note that this suggestion follows into the other spec that gets updated in this PR also.

cdeszaq

Looks good! 👍

--A test version of the AsynchronousWorkflowsBuilder is introduced. This version extends the DefaultAsynchronousWorkflowsBuilder, and builds the exact same workflow. However, it also provides a mechanism for outside classes to add additional Observers to the workflows, allowing them to do such things as add countdown latches to have thread-safe tests.

archolewa · 2016-09-20T14:52:00Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/JobsApiRequest.java

-        broadCastChannelPreResponseObservable.connect();
-
-        return broadCastChannelPreResponseObservable;
-    }


This has been moved into the method JobsServlet::getResults

Needs note in Changelog

archolewa · 2016-09-20T14:54:36Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/web/JobsApiRequestSpec.groovy

-
-        then:
-        testSubscriber.assertReceivedOnNext([ticket1PreResponse])
-    }


This test is subsumed by tests in the JobsServletReactiveChainforResultsEndpointSpec that verify the results come back synchronously if they are ready within the asyncAfter timeout.

archolewa · 2016-09-20T14:55:54Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/web/JobsApiRequestSpec.groovy

-
-        then:
-        testSubscriber.assertReceivedOnNext([ticket1PreResponse])
-    }


This is subsumed by a new test in the JobsServletReactiveChainforResultsEndpointSpec

archolewa · 2016-09-20T14:56:42Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/web/JobsApiRequestSpec.groovy

-        then: "preResponseObservable is empty (the chain is complete, and no values were sent)"
-        ReactiveTestUtils.assertCompletedWithoutError(testSubscriber)
-        testSubscriber.assertNoValues()
-    }


This is subsumed by tests in the JobsServletReactiveChainforResultsEndpointSpec that verify that the getResults method returns an empty Observable if the results are not ready within the asynchronous timeout.

archolewa · 2016-09-20T14:58:33Z

...om/yahoo/bard/webservice/web/endpoints/JobsServletReactiveChainforResultsEndpointSpec.groovy

    }

-    def "If the PreResponse is not available in the PreResponseStore initially and the notification from broadcastChannel is received within the async timeout, we go to the PreResponsestore twice"() {


archolewa · 2016-09-20T14:58:48Z

...om/yahoo/bard/webservice/web/endpoints/JobsServletReactiveChainforResultsEndpointSpec.groovy

    }

-    def "If the PreResponse is available in the PreResponseStore and the notification from broadcastChannel is received within the async timeout, we go to the PreResponsestore twice"() {


Whether or not the results are available actually has zero impact on this test. This test is testing how notifications from the broadcastChannel affects processing.

RVRSKumar

Minor indentation issues

RVRSKumar · 2016-09-20T15:11:06Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/JobsApiRequest.java

    private final String filters;

    /**
     * Parses the API request URL and generates the Api Request object.
-     *
-     * @param format  response data format JSON or CSV. Default is JSON.
+     *  @param format  response data format JSON or CSV. Default is JSON.


Seems like an extra whitespace got added here.

RVRSKumar · 2016-09-20T15:12:20Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/JobsApiRequest.java

     * @param asyncAfter  How long the user is willing to wait for a synchronous request in milliseconds
     * @param perPage  number of rows to display per page of results. If present in the original request,
-     * must be a positive integer. If not present, must be the empty string.
+ * must be a positive integer. If not present, must be the empty string.


Seems like an indentation issue with the editor. Can we fix this as well?

cdeszaq · 2016-09-20T17:59:50Z

CHANGELOG.md

@@ -37,6 +41,10 @@ Current


 ### Changed:
+-  [The `TestBinderFactory` now uses the `TestAsynchronousWorkflowsBuilder`](http://github.com/yahoo/fili/pull/39)


More info needed in the changelog around the API changes

cdeszaq · 2016-09-20T18:02:52Z

.../src/main/java/com/yahoo/bard/webservice/async/preresponses/stores/HashPreResponseStore.java


 /**
 * An in-memory implementation of PreResponseStore mainly for testing purposes. It only provides functionality to save
 * an entry to store and get an entry from the store. It does not have delete functionality nor does it take care of
 * cleaning stale data.
+ * <p>
+ * Since the HashPreResponseStore is intended primarily for testing, it also includes two maps of tickets to


Some thoughts here:

1.) If this is primarily for testing, we should move this class to the test source root
2.) If this class is useful for non-testing, then we should leave it here and remove the latches (assuming they are not useful for non-testing workloads)
3.) If we want both (ie. keep this class for real use, and also give it the latch counting behavior for tests to use, then we should have a LatchedHashPreResponseStore in the test source root that extends this one, adding the latching behavior.

Either way, things that are only for testing don't belong in code that is in the src root. That's what the test root is for.

cdeszaq · 2016-09-20T18:07:27Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/JobsApiRequest.java

-        broadCastChannelPreResponseObservable.connect();
-
-        return broadCastChannelPreResponseObservable;
-    }


Needs note in Changelog

cdeszaq · 2016-09-20T18:19:07Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/endpoints/JobsServlet.java

+         * subscribers. We use the replay operator so that the preResponseObservable upon connection, will begin
+         * collecting values.
+         * Once a new observer subscribes to the observable, it will have all the collected values replayed to it.
+         */


Comment should get moved inside the else where it pertains.

cdeszaq · 2016-09-20T18:19:38Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/endpoints/JobsServlet.java

+         * Once a new observer subscribes to the observable, it will have all the collected values replayed to it.
+         */
+        if (asyncAfter == JobsApiRequest.ASYNCHRONOUS_ASYNC_AFTER_VALUE) {
+            return Observable.empty();


Comment on why this is good to do would be nice. (It's not immediately clear why we're doing this)

cdeszaq · 2016-09-20T18:23:48Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/endpoints/JobsServlet.java

+                    .replay(1);
+            broadcastChannelNotifications.connect();
+            return preResponseStore.get(ticket).switchIfEmpty(
+                    applyTimeoutIfNeeded(broadcastChannelNotifications, asyncAfter).flatMap(preResponseStore::get)


Are you sure this is doing the correct thing? To me, this looks like it will wait until preResponseStore.get() completes, and then activate the "timeout" (if needed).

I'm guessing this is now doing the right thing, but it's not immediately obvious, so more comments are likely needed around what's happening, and why it's correct. (talking about the ordering of things might not hurt either)

Note that before, the timeout step was injected before the preResponseStore::get call...

Tracing through how this got called a bit more, I see how this is still doing the same thing, but it still wasn't obvious what was going on as far as ordering, so comments would definitely help make that better

Since asyncAfter is best effort for anything but always (which doesn't go to the store), it doesn't really matter either way. This basically gives the asyncAfter=0 the semantics of "If the results are available, give them to me, otherwise quickly send back the async payload."

cdeszaq · 2016-09-20T18:44:14Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/endpoints/JobsServlet.java

-         //Check the PreResponseStore to see if the PreResponse associated with the given ticket has been stored in
-        //the PreResponseStore. If not, wait for a PreResponse for the amount of time specified in async.
-        return preResponseStore.get(ticket).switchIfEmpty(broadCastChannelPreResponseObservable);
+    private <T> Observable<T> applyTimeoutIfNeeded(Observable<T> observable, long asyncAfter) {


It feels like what we're actually making is a new "operator" that extends the timeout and adds "always" and "never" semantics in the same parameter. With that in mind, does it makes sense for this method to take care of all 3 options? I see it looking something like this, perhaps:

private <T> Observable<T> applyTimeoutIfNeeded(Observable<T> primary, long asyncAfter, Observable<T> alternate) { return asyncAfter == JobsApiRequest.ASYNCHRONOUS_ASYNC_AFTER_VALUE ? // Always alternate alternate : asyncAfter == JobsApiRequest.SYNCHRONOUS_ASYNC_AFTER_VALUE ? // Always primary primary : // Timeout-based switch primary.timeout(asyncAfter, TimeUnit.MILLISECONDS, alternate); }

Note that I also decoupled the "alternate" stream from the method, so that it's passed in, and gave params better names.

If we have this method dealing with the "tri-state" behavior, then we can consolidate that logic in one place, and I think it may simplify the switching behavior a bit.

If you look closely, you'll see that the result of applyTimeOutIfNeeded is actually being applied inside of a preResponseStore.get().switchIfEmpty call. In other words, we aren't checking the PreResponseStore if the request is always asynchronous (there's no point), but we are in the other two cases.

So we can't really merge the three cases into a single helper method without more thought than is worth it. If we did, we'd probably just end up with implementing all of getResults in applyTimeoutIfNeeded.

ok, makes sense

cdeszaq · 2016-09-20T19:24:14Z

...om/yahoo/bard/webservice/web/endpoints/JobsServletReactiveChainforResultsEndpointSpec.groovy


-        and: "We miss the notification that the preResponse is stored in the PreResponseStore"
+        when: "We miss the notification that the preResponse is stored in the PreResponseStore"


This should also be a given piece. The only "when" involved here is the getResults call.

cdeszaq · 2016-09-20T19:35:16Z

...om/yahoo/bard/webservice/web/endpoints/JobsServletReactiveChainforResultsEndpointSpec.groovy

-        and: "We receive the notification after async timeout"
-        broadcastChannel.publish("ticket4")
+        and: "We wait for the first attempt to get results from the store to come up empty before we add fake results"
+        getTicket1Latch.await(30, TimeUnit.SECONDS)


Why are we waiting for up to 30 seconds before continuing here? (note that if this timeout expires, this just returns false, even if the latch has not released.

So that the test doesn't hang indefinitely in case something goes wrong and we never actually countdown the latches. I've moved that timeout into a timeout annotation on the test.

cdeszaq · 2016-09-20T19:35:19Z

...om/yahoo/bard/webservice/web/endpoints/JobsServletReactiveChainforResultsEndpointSpec.groovy

-        then: "then we go to the PreResponseStore exactly once to get the ticket"
-        1 * mockPreResponseStore.get(_) >> Observable.just(Mock(PreResponse))
+        and: "We wait for the results to be successfully stored before sending a ready notification"
+        saveTicket1Latch.await(30, TimeUnit.SECONDS)


Why are we waiting for up to 30 seconds before continuing here? (note that if this timeout expires, this just returns false, even if the latch has not released.

cdeszaq · 2016-09-20T19:38:10Z

...om/yahoo/bard/webservice/web/endpoints/JobsServletReactiveChainforResultsEndpointSpec.groovy

+        when: "We start the async chain"
+        mockJobServlet.getResults("ticket4", apiRequest1.asyncAfter)
+        //The delay is to ensure that we get the notification after async timeout
+        Thread.sleep(1000)


Can we not sleep here?

This test is testing that things behave appropriately in the case where we have a numeric asyncAfter. Unfortunately, we currently have no way of injecting latches or anything into the workflow. So our choices are to sleep (which the test was already doing before I simplified it), or refactor the JobsServlet so that we have some way of injecting additional components into the workflow.

While the second may or may not be a valid approach (depending on how much control we want to give clients over the JobServlet), it's outside the scope of this PR.

cdeszaq · 2016-09-21T19:08:39Z

CHANGELOG.md

+- [Removed `JobsApiRequest::handleBroadcastChannelNotification`](https://github.com/yahoo/fili/pull/39)
+   * That logic does not really belong in the `JobsApiRequest` (which is responsible for modeling a response, not processing it), and has
+        been consolidated into the `JobsServlet`.
+>>>>>>> Stashed changes


Some merge conflict markers in here still...

Apparently (vanilla) git says nothing if it has conflicts when applying a stash. This is annoying.

cdeszaq · 2016-09-21T19:13:52Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/JobsApiRequest.java

    private final String filters;

    /**
     * Parses the API request URL and generates the Api Request object.
-     *


cdeszaq · 2016-09-22T16:30:47Z

CHANGELOG.md

-=======
-  [HashPreResponseStore moved to `test` root directory.](https://github.com/yahoo/fili/pull/39)
-   * The `HashPreResponseStore` is really intended only for testing, and does not have capabilities (i.e. TTL) that are needed for production.
+


cdeszaq · 2016-09-22T16:33:57Z

👍

cdeszaq · 2016-09-22T16:31:41Z

CHANGELOG.md

+-  [The `TestBinderFactory` now uses the `TestAsynchronousWorkflowsBuilder`](http://github.com/yahoo/fili/pull/39)
+    * This allows the asynchronous functional tests to add countdown latches to the workflows where necessary, allowing
+      for thread-safe tests.
+


There's an extra blank line in here that's not needed
/NotABlocker

dayamr · 2016-09-22T19:57:22Z

It is good to add a test case for asyncAfter=<non-zero-integer> to check that behavior. It requires custom timeout to mimic this behavior across the workflow and we can consider it as a separate effort. Is it good candidate to create this as an issue? If not, what is the right way to track this?

Apart from the above comment, it looks good to me. 👍

--`always` is guaranteed to return an asynchronous payload, regardless of how quickly the results come back. -- The asynchronous functional tests that expect an asynchronous result use the `always` keyword in order to ensure consistency.

archolewa added BREAKFIX NEED 2 REVIEWS REVIEWABLE labels Sep 15, 2016

archolewa force-pushed the fix-flaky-test branch from 3b1984e to a473e3c Compare September 15, 2016 19:43

cdeszaq reviewed Sep 15, 2016

View reviewed changes

cdeszaq added the NEED CHANGES label Sep 15, 2016

cdeszaq approved these changes Sep 16, 2016

View reviewed changes

cdeszaq added MERGEABLE NEED SQUASH and removed NEED 2 REVIEWS NEED CHANGES REVIEWABLE labels Sep 16, 2016

archolewa force-pushed the fix-flaky-test branch 2 times, most recently from 9807263 to e08cd30 Compare September 16, 2016 22:00

archolewa force-pushed the fix-flaky-test branch from e08cd30 to 1fdbde9 Compare September 16, 2016 22:01

archolewa added NEED 2 REVIEWS REVIEWABLE and removed NEED SQUASH MERGEABLE labels Sep 16, 2016

archolewa force-pushed the fix-flaky-test branch from 8e73a8b to f58b0bd Compare September 20, 2016 14:49

archolewa commented Sep 20, 2016

View reviewed changes

RVRSKumar reviewed Sep 20, 2016

View reviewed changes

archolewa force-pushed the fix-flaky-test branch from f58b0bd to 6cca0ae Compare September 20, 2016 16:00

cdeszaq reviewed Sep 20, 2016

View reviewed changes

cdeszaq requested changes Sep 20, 2016

View reviewed changes

cdeszaq added the NEED CHANGES label Sep 20, 2016

archolewa removed the NEED CHANGES label Sep 21, 2016

cdeszaq reviewed Sep 21, 2016

View reviewed changes

archolewa mentioned this pull request Sep 21, 2016

Adds documentation for topN. #43

Merged

cdeszaq reviewed Sep 22, 2016

View reviewed changes

archolewa mentioned this pull request Sep 22, 2016

Improves error when sort clause missing in TopN query #44

Merged

cdeszaq reviewed Sep 22, 2016

View reviewed changes

cdeszaq added NEED 1 REVIEW and removed NEED 2 REVIEWS labels Sep 22, 2016

archolewa added the NEED SQUASH label Sep 22, 2016

archolewa added MERGEABLE and removed NEED 1 REVIEW REVIEWABLE labels Sep 22, 2016

Adds asyncAfter keyword always.

33787a2

--`always` is guaranteed to return an asynchronous payload, regardless of how quickly the results come back. -- The asynchronous functional tests that expect an asynchronous result use the `always` keyword in order to ensure consistency.

archolewa force-pushed the fix-flaky-test branch from 0b54876 to 33787a2 Compare September 22, 2016 20:17

archolewa removed the NEED SQUASH label Sep 22, 2016

archolewa merged commit 33787a2 into master Sep 22, 2016

archolewa deleted the fix-flaky-test branch September 22, 2016 20:29

		}

		def "If the PreResponse is not available in the PreResponseStore initially and the notification from broadcastChannel is received within the async timeout, we go to the PreResponsestore twice"() {

		}

		def "If the PreResponse is available in the PreResponseStore and the notification from broadcastChannel is received within the async timeout, we go to the PreResponsestore twice"() {

		@@ -37,6 +41,10 @@ Current


		### Changed:
		- [The `TestBinderFactory` now uses the `TestAsynchronousWorkflowsBuilder`](http://github.com/yahoo/fili/pull/39)


		and: "We miss the notification that the preResponse is stored in the PreResponseStore"
		when: "We miss the notification that the preResponse is stored in the PreResponseStore"

Fixes some flaky asynchronous tests. #39

Fixes some flaky asynchronous tests. #39

Conversation

archolewa commented Sep 15, 2016

dayamr commented Sep 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeszaq Sep 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeszaq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archolewa Sep 20, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archolewa Sep 20, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RVRSKumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeszaq Sep 20, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archolewa Sep 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeszaq commented Sep 22, 2016

Choose a reason for hiding this comment

dayamr commented Sep 22, 2016

cdeszaq Sep 15, 2016 •

edited

Loading

archolewa Sep 20, 2016 •

edited

Loading

archolewa Sep 20, 2016 •

edited

Loading

cdeszaq Sep 20, 2016 •

edited

Loading

archolewa Sep 21, 2016 •

edited

Loading