Adds some additional timings. #110

archolewa · 2016-11-28T23:12:56Z

--In order to better aid in performance optimizations, some additional
timing has been added to the request processing workflow.

--Based on past experiences, most slow queries spend the majority of
their time building a DataApiRequest, so timings have been added
around all the major steps in building a DataApiRequest.

--Furthermore, it's expected that the bulk of the time spent building a
DataApiRequest is spent resolving filters, so some additional timings
have been added to the LuceneSearchProvider, the provider most
commonly used for large dimensions.

--It's also possible that a lot of the request workflow is spent
serializing large Druid queries, so some timings have been added around
serializing Druid queries.

archolewa · 2016-11-28T23:16:39Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/DataApiRequest.java


        this.filterBuilder = bardConfigResources.getFilterBuilder();

+        MetricDictionary metricDictionary = bardConfigResources
+                .getMetricDictionary()
+                .getScope(Collections.singletonList(tableName));


This change was necessary to make metricDictionary effectively final for use in lambdas (also the previous approach was confusing).

How so? Couldn't you just mark the existing local variable final and make the compiler happy?

Because we were originally assigning the metricDictionary variable at the very beginning of the method, but didn't do anything with it until we reassigned it to the above value just before actually using it.

Since we were reassigning it, it was no longer effectively final. So I just got rid of the unused initialization.

Ahh, I see, this is just a straight refactor, no actual changes. got it.

archolewa · 2016-11-28T23:19:31Z

Note that I tried to add timings just to steps of building the DataApiRequest that I felt was interesting or complex enough to be worth timing.

cdeszaq · 2016-11-28T23:18:23Z

fili-core/src/main/java/com/yahoo/bard/webservice/data/dimension/impl/LuceneSearchProvider.java

        lock.readLock().lock();
+        RequestLog.stopTiming("LuceneReadLock");


We shouldn't time this. A read-lock should be fast enough that the timings will actually be much slower than the lock. If we really want to get a sense of how much time this takes, we should run a profiler on this running under an actual workload.

cdeszaq · 2016-11-28T23:19:25Z

fili-core/src/main/java/com/yahoo/bard/webservice/data/dimension/impl/LuceneSearchProvider.java

        lock.readLock().lock();
+        RequestLog.stopTiming("QueryingLuceneForPage" + currentPage + "ReadLock");


Don't time read-locks

cdeszaq · 2016-11-28T23:21:59Z

fili-core/src/main/java/com/yahoo/bard/webservice/data/dimension/impl/LuceneSearchProvider.java

@@ -587,6 +591,7 @@ private Query getFilterQuery(Set<ApiFilter> filters, int perPage) {
            throw new RuntimeException(e);


Can't comment in the right place, but there are effectively 2 steps in this search operation:

Get the Lucene results for the right page

Hydrate the dim rows

I would wrap a timer around each of those high-level steps

😆 I changed where I stopped the timer for querying Lucene to capture just the code that queries Lucene, but forgot to add a timer for hydrating the rows.

cdeszaq · 2016-11-28T23:24:14Z

...c/main/java/com/yahoo/bard/webservice/data/filterbuilders/ConjunctionDruidFilterBuilder.java

@@ -42,7 +43,10 @@ public Filter buildFilters(Map<Dimension, Set<ApiFilter>> filterMap) throws Dime

        List<Filter> dimensionFilters = new ArrayList<>(filterMap.size());
        for (Map.Entry<Dimension, Set<ApiFilter>> entry : filterMap.entrySet()) {
+            String filterTimerName = "Building" + entry.getKey().getApiName() + "Filters";
+            RequestLog.startTiming(filterTimerName);


This is going to make a ton of timers and massively fill the request log. Don't do this per-event. Do it at the outer layer.

If you really want per-element timings, use the raw Metrics timings framework (ie. not requestLog), or write down a LogInfo object that includes the count of how many filters are being built

FYI, we already have a LogInfo object that includes a count of how many filters are being built (the Filter LogBlock).

Note that the Filter log block counts the number of API filters, not the Druid query filters.

Inner log block recording information related to the API filters of a query.
~ Class-level Javadoc from the Filter log block

Ah, good catch. I'll look into enriching that class then to include both the API filters and the Druid filters.

It would be better to add another block, rather than combine them I think. Especially as we look to the possibility of breaking the Druid-specific aspects into it's own module...

cdeszaq · 2016-11-28T23:25:31Z

fili-core/src/main/java/com/yahoo/bard/webservice/data/dimension/impl/LuceneSearchProvider.java

        try {
+            RequestLog.startTiming("QueryingLuceneForPage" + currentPage);


If there are many pages, that could create a large bloat to the request log. I would move this timing up a level. You can calculate the per-page timings using the pagination information that I think should already be in the RequestLog.

cdeszaq · 2016-11-28T23:26:56Z

...core/src/main/java/com/yahoo/bard/webservice/druid/client/impl/AsyncDruidWebServiceImpl.java

            entityBody = writer.writeValueAsString(druidQuery);
        } catch (JsonProcessingException e) {
            throw new IllegalStateException(e);
+        } finally {
+            RequestLog.stopTiming("DruidQuerySerialization");


This should include the sequence number (see the timing we do below). Otherwise, when stitching requests back together, we may have trouble.

cdeszaq · 2016-11-28T23:30:33Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/DataApiRequest.java


        this.filterBuilder = bardConfigResources.getFilterBuilder();

+        MetricDictionary metricDictionary = bardConfigResources
+                .getMetricDictionary()
+                .getScope(Collections.singletonList(tableName));


How so? Couldn't you just mark the existing local variable final and make the compiler happy?

cdeszaq · 2016-11-28T23:36:33Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/DataApiRequest.java

+     *
+     * @return The value returned by the generator
+     */
+    private <T> T timeGenerator(Producer<T> generator, String timerName) {


While this solution is cute, it uglies up the calls everywhere, making the code much harder to understand. Instead, if we want to do this, we should use annotations on the "generator" methods. Take a look at the @Timed annotation in the Metrics library to see how they are doing it. Having a similar annotation for the RequestLog may be exactly what we're looking for.

Why can't Java annotations be like Python decorators? Why???

Based on what I've found so far, it seems like in order to do what you want, we need to dive into AspectJ or Spring's aspect-oriented programming. While that stuff seems like it has merit, I really don't have time to wrap my head around a whole new paradigm of OOP programming, or figuring out how to then use Spring/AspectJ to implement them.

So I'm just going to brute-force wrap all the timed invocations in a try-finally block. If someone else wants to take the time grok aspect-oriented programming in the context of AspectJ or Spring AOP, and create proper annotations, they are welcome to.

Fair enough. I dug in a bit deeper too and it seems you are indeed right. That said, we should open an issue to revisit this, since it would help the RequestLog piece immensely to be able to be annotation-driven, rather than code-driven.

That said, we should make sure the timings are internal to the methods that are getting timed, since callers don't really need this complexity, and the method is better suited to handling it. (note: I've not looked at your updates yet, so this may already be the case).

I agree 100% that making the RequestLog annotation-based would be a Good Thing(TM).

I haven't actually pushed the timings into the methods. I'll do that now.

cdeszaq · 2016-11-28T23:37:55Z

...core/src/main/java/com/yahoo/bard/webservice/web/handlers/AsyncWebServiceRequestHandler.java

@@ -51,7 +51,12 @@ public void invoke(JsonNode rootNode) {
        HttpErrorCallback error = response.getErrorCallback(druidQuery);
        FailureCallback failure = response.getFailureCallback(druidQuery);

-        druidWebService.postDruidQuery(context, success, error, failure, druidQuery);
+        try {
+            RequestLog.startTiming("PostingDruidQuery");


This timing should be moved into the actual druid client (and it may already be there). Also, I don't think it'll time what you want, since it's only going to time how long the request takes to send, not send and come back. (yay async!)

I intended to time how long it takes to send (hence the name postingDruidQuery) it's an attempt to give us some idea of how much of the time we spend sending the druid query is spent serializing it.

Like I said, this should move into the methods doing the sending, rather than the caller. Also, the timer needs to take into account the sequence number of the query, I think, in order for log merging to work correctly.

This sounds like something tricky enough to justify a DruidServlet and a test case to make sure that all the phases get tested.

cdeszaq · 2016-11-29T16:38:59Z

Annotations don't require an aop framework at all. They are much simpler than that. Look at how the Timed annotation from Metrics does what it does. It should be pretty much what we want to do.

…

On Nov 29, 2016, at 8:26 AM, Andrew Cholewa ***@***.***> wrote: @archolewa commented on this pull request. In fili-core/src/main/java/com/yahoo/bard/webservice/web/DataApiRequest.java: > @@ -290,6 +327,24 @@ public DataApiRequest( } /** + * Times the execution of the specified producer, making sure to stop timing in a finally clause. + * + * @param generator The producer to be timed + * @param timerName The name to use for the timer + * @param <T> The type of the object produced + * + * @return The value returned by the generator + */ + private <T> T timeGenerator(Producer<T> generator, String timerName) { Based on what I've found so far, it seems like in order to do what you want, we need to dive into AspectJ or Spring's aspect-oriented programming. While that stuff seems like it has merit, I really don't have time to wrap my head around a whole new paradigm of OOP programming, or figuring out how to then use Spring/AspectJ to implement them. So I'm just going to brute-force wrap all the timed invocations in a try-finally block. If someone else wants to take the time grok aspect-oriented programming in the context of AspectJ or Spring AOP, and create proper annotations, they are welcome to. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

archolewa · 2016-11-29T16:41:21Z

I did look into how @Timed works. And everything I found says "You need AspectJ or Spring AOP to use @timed."

Furthermore, all the examples I found for writing custom annotations seem to assume you have your own main method where you can scan files for annotations and do something, rather than doing something special on method invocation.

cdeszaq

Just a few more things 😄

cdeszaq · 2016-11-29T17:36:34Z

fili-core/src/main/java/com/yahoo/bard/webservice/data/dimension/impl/LuceneSearchProvider.java

@@ -562,7 +564,7 @@ private Query getFilterQuery(Set<ApiFilter> filters, int perPage) {
                    throw new PageNotFoundException(requestedPageNumber, perPage, 0);
                }
            }
-


Keep this whitespace

cdeszaq · 2016-11-29T17:38:09Z

fili-core/src/main/java/com/yahoo/bard/webservice/web/DataApiRequest.java


        this.filterBuilder = bardConfigResources.getFilterBuilder();

+        MetricDictionary metricDictionary = bardConfigResources
+                .getMetricDictionary()
+                .getScope(Collections.singletonList(tableName));


Ahh, I see, this is just a straight refactor, no actual changes. got it.

archolewa · 2016-11-29T17:39:40Z

#111 issue to make the RequestLog annotation-based.

archolewa · 2016-11-29T23:13:14Z

I know. I died a little with every line I wrote. My higher-order timing function is still an option ;)

michael-mclawhorn · 2016-12-01T18:47:32Z

What the status on this? Is the job error being pursued?

archolewa · 2016-12-01T19:01:38Z

@michael-mclawhorn The status is "Oh crap the quarter's almost over and I have other things that need to get done!"

archolewa · 2016-12-08T17:04:41Z

@michael-mclawhorn The status is now "I addressed the things."

michael-mclawhorn · 2016-12-08T17:47:42Z

👍 Seems reasonable.

michael-mclawhorn

👍

cdeszaq · 2016-12-08T18:11:35Z

fili-core/src/test/groovy/com/yahoo/bard/webservice/async/AsyncInvalidApiRequest.groovy

@@ -37,7 +37,8 @@ class AsyncInvalidApiRequest extends AsyncFunctionalSpec {

    @Override
    Map<String, Closure<Void>> getResultAssertions() {
-        [ data: { assert GroovyTestUtils.compareJson(it.readEntity(String), EXPECTED_ERROR_MESSAGE) } ]
+        [ data: { assert GroovyTestUtils.compareErrorPayload(it.readEntity(String), EXPECTED_ERROR_MESSAGE) } ]


What impact does this change have?

Thanks to a previous change, error messages now have a requestId, which is a unique id. compareErrorPayload ignores the requestId when performing a comparison. I have no idea how this test got through those changes unchanged, but it started failing when I rebased my code onto master.

cdeszaq · 2016-12-08T18:13:09Z

fili-core/src/main/java/com/yahoo/bard/webservice/logging/blocks/DruidFilterInfo.java

+
+import com.fasterxml.jackson.annotation.JsonAutoDetect;
+
+import com.yahoo.bard.webservice.druid.model.filter.Filter;


cdeszaq · 2016-12-08T18:26:58Z

fili-core/src/main/java/com/yahoo/bard/webservice/logging/blocks/DruidFilterInfo.java

+            filterTypeCounter.put(
+                    currentFilter.getClass().getSimpleName(),
+                    filterTypeCounter.getOrDefault(currentFilter.getClass().getSimpleName(), 0L) + 1L
+            );


Can be replaced with Map::merge:

filterTypeCounter.merge(currentFilter.getClass().getSimpleName(), 1L, (old, increment) -> old + increment);

cdeszaq · 2016-12-08T18:38:35Z

fili-core/src/main/java/com/yahoo/bard/webservice/logging/blocks/DruidFilterInfo.java

+                filterStack.addAll(((MultiClauseFilter) currentFilter).getFields());
+            } else if (currentFilter instanceof NotFilter) {
+                filterStack.add(((NotFilter) currentFilter).getField());
+            }


This smells a little like extensibility pain... It would be great if Java made it easy to ask if things had various methods, but about the best we can do is add interfaces for specific methods. To that end, do we want to make HasFields and HasField interfaces, apply those interfaces to the MultiVlauseFilter and NotFilter (respectively), and then update this conditional to depend on those interfaces instead of these more concrete classes?

Specifically, it's the NotFilter dependency that drew my attention, since the MultiClauseFilter is abstract already and (roughly) intended to serve the same purpose as the interface would.

If we are going to have that kind of interface, I'd much rather just have HasFields (Not can just return a singleton set). Though I'd prefer a better name than HasFields. I find that kind of convention hideously ugly in Java. Perhaps something like ComplexFilter? i.e. a Filter that's built out of other filters?

To me, Complex isn't specific enough. HasFilters is nice, in a way, because it fits (roughly) the bean convention of getFilters / setFilters roughly translating to the filters property present in Groovy and C# for example.

That said, I'm happy with ComplexFilter meaning "has a getFilters method". In part, I think we're fighting the generality of Druid's filters and their decision to use "field" to mean "another filter".

cdeszaq · 2016-12-08T21:48:13Z

Checkstyle needs fix... otherwise I think this is good to squash into logical commits and merge.

--A map is logged that describes the structure of the filter being sent to druid. For each filter it includes a count of the number of instances of that filter.

archolewa added NEED 2 REVIEWS REVIEWABLE labels Nov 28, 2016

archolewa force-pushed the add-timings branch from ad17aff to e431af9 Compare November 28, 2016 23:14

archolewa commented Nov 28, 2016

View reviewed changes

cdeszaq requested changes Nov 28, 2016

View reviewed changes

cdeszaq requested changes Nov 29, 2016

View reviewed changes

archolewa force-pushed the add-timings branch from aaa3fb6 to c1c0577 Compare November 29, 2016 18:06

cdeszaq added NEED 1 REVIEW NEED CHANGES and removed NEED 2 REVIEWS labels Nov 29, 2016

archolewa removed the NEED CHANGES label Nov 29, 2016

cdeszaq added this to the v0.7.x milestone Dec 1, 2016

archolewa force-pushed the add-timings branch 4 times, most recently from 8d5c155 to 0f4ddc1 Compare December 8, 2016 16:59

michael-mclawhorn approved these changes Dec 8, 2016

View reviewed changes

cdeszaq requested changes Dec 8, 2016

View reviewed changes

cdeszaq added the NEED CHANGES label Dec 8, 2016

archolewa removed the NEED CHANGES label Dec 8, 2016

cdeszaq approved these changes Dec 8, 2016

View reviewed changes

cdeszaq added MERGEABLE and removed NEED 1 REVIEW REVIEWABLE labels Dec 8, 2016

archolewa added 2 commits December 8, 2016 15:55

Adds some additional timings to request processing

9dd7307

Fixes invalid test.

797cccb

archolewa force-pushed the add-timings branch from f41958c to 5cb3a13 Compare December 8, 2016 21:59

Logs a compact description of druid filters.

a0b798f

--A map is logged that describes the structure of the filter being sent to druid. For each filter it includes a count of the number of instances of that filter.

archolewa force-pushed the add-timings branch from 5cb3a13 to a0b798f Compare December 8, 2016 22:48

archolewa merged commit 896fbc9 into master Dec 8, 2016

archolewa deleted the add-timings branch December 8, 2016 23:00

		lock.readLock().lock();
		RequestLog.stopTiming("LuceneReadLock");

		lock.readLock().lock();
		RequestLog.stopTiming("QueryingLuceneForPage" + currentPage + "ReadLock");

		@@ -587,6 +591,7 @@ private Query getFilterQuery(Set<ApiFilter> filters, int perPage) {
		throw new RuntimeException(e);

		try {
		RequestLog.startTiming("QueryingLuceneForPage" + currentPage);


		import com.fasterxml.jackson.annotation.JsonAutoDetect;

		import com.yahoo.bard.webservice.druid.model.filter.Filter;

Adds some additional timings. #110

Adds some additional timings. #110

Conversation

archolewa commented Nov 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archolewa commented Nov 28, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archolewa Nov 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeszaq commented Nov 29, 2016 via email

archolewa commented Nov 29, 2016 • edited Loading

cdeszaq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archolewa commented Nov 29, 2016

archolewa commented Nov 29, 2016

michael-mclawhorn commented Dec 1, 2016

archolewa commented Dec 1, 2016

archolewa commented Dec 8, 2016

michael-mclawhorn commented Dec 8, 2016

michael-mclawhorn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archolewa Dec 8, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdeszaq commented Dec 8, 2016

archolewa commented Nov 28, 2016 •

edited

Loading

archolewa Nov 29, 2016 •

edited

Loading

archolewa commented Nov 29, 2016 •

edited

Loading

archolewa Dec 8, 2016 •

edited

Loading