Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds some additional timings. #110

Merged
merged 3 commits into from
Dec 8, 2016
Merged

Adds some additional timings. #110

merged 3 commits into from
Dec 8, 2016

Conversation

archolewa
Copy link
Contributor

--In order to better aid in performance optimizations, some additional
timing has been added to the request processing workflow.

--Based on past experiences, most slow queries spend the majority of
their time building a DataApiRequest, so timings have been added
around all the major steps in building a DataApiRequest.

--Furthermore, it's expected that the bulk of the time spent building a
DataApiRequest is spent resolving filters, so some additional timings
have been added to the LuceneSearchProvider, the provider most
commonly used for large dimensions.

--It's also possible that a lot of the request workflow is spent
serializing large Druid queries, so some timings have been added around
serializing Druid queries.


this.filterBuilder = bardConfigResources.getFilterBuilder();

MetricDictionary metricDictionary = bardConfigResources
.getMetricDictionary()
.getScope(Collections.singletonList(tableName));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was necessary to make metricDictionary effectively final for use in lambdas (also the previous approach was confusing).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? Couldn't you just mark the existing local variable final and make the compiler happy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we were originally assigning the metricDictionary variable at the very beginning of the method, but didn't do anything with it until we reassigned it to the above value just before actually using it.

Since we were reassigning it, it was no longer effectively final. So I just got rid of the unused initialization.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see, this is just a straight refactor, no actual changes. got it.

@archolewa
Copy link
Contributor Author

archolewa commented Nov 28, 2016

Note that I tried to add timings just to steps of building the DataApiRequest that I felt was interesting or complex enough to be worth timing.

lock.readLock().lock();
RequestLog.stopTiming("LuceneReadLock");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't time this. A read-lock should be fast enough that the timings will actually be much slower than the lock. If we really want to get a sense of how much time this takes, we should run a profiler on this running under an actual workload.

lock.readLock().lock();
RequestLog.stopTiming("QueryingLuceneForPage" + currentPage + "ReadLock");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't time read-locks

@@ -587,6 +591,7 @@ private Query getFilterQuery(Set<ApiFilter> filters, int perPage) {
throw new RuntimeException(e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment in the right place, but there are effectively 2 steps in this search operation:

  1. Get the Lucene results for the right page
  2. Hydrate the dim rows

I would wrap a timer around each of those high-level steps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 I changed where I stopped the timer for querying Lucene to capture just the code that queries Lucene, but forgot to add a timer for hydrating the rows.

@@ -42,7 +43,10 @@ public Filter buildFilters(Map<Dimension, Set<ApiFilter>> filterMap) throws Dime

List<Filter> dimensionFilters = new ArrayList<>(filterMap.size());
for (Map.Entry<Dimension, Set<ApiFilter>> entry : filterMap.entrySet()) {
String filterTimerName = "Building" + entry.getKey().getApiName() + "Filters";
RequestLog.startTiming(filterTimerName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to make a ton of timers and massively fill the request log. Don't do this per-event. Do it at the outer layer.

If you really want per-element timings, use the raw Metrics timings framework (ie. not requestLog), or write down a LogInfo object that includes the count of how many filters are being built

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we already have a LogInfo object that includes a count of how many filters are being built (the Filter LogBlock).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the Filter log block counts the number of API filters, not the Druid query filters.

Inner log block recording information related to the API filters of a query.
~ Class-level Javadoc from the Filter log block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. I'll look into enriching that class then to include both the API filters and the Druid filters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to add another block, rather than combine them I think. Especially as we look to the possibility of breaking the Druid-specific aspects into it's own module...

try {
RequestLog.startTiming("QueryingLuceneForPage" + currentPage);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are many pages, that could create a large bloat to the request log. I would move this timing up a level. You can calculate the per-page timings using the pagination information that I think should already be in the RequestLog.

entityBody = writer.writeValueAsString(druidQuery);
} catch (JsonProcessingException e) {
throw new IllegalStateException(e);
} finally {
RequestLog.stopTiming("DruidQuerySerialization");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should include the sequence number (see the timing we do below). Otherwise, when stitching requests back together, we may have trouble.


this.filterBuilder = bardConfigResources.getFilterBuilder();

MetricDictionary metricDictionary = bardConfigResources
.getMetricDictionary()
.getScope(Collections.singletonList(tableName));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? Couldn't you just mark the existing local variable final and make the compiler happy?

*
* @return The value returned by the generator
*/
private <T> T timeGenerator(Producer<T> generator, String timerName) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this solution is cute, it uglies up the calls everywhere, making the code much harder to understand. Instead, if we want to do this, we should use annotations on the "generator" methods. Take a look at the @Timed annotation in the Metrics library to see how they are doing it. Having a similar annotation for the RequestLog may be exactly what we're looking for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't Java annotations be like Python decorators? Why???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I've found so far, it seems like in order to do what you want, we need to dive into AspectJ or Spring's aspect-oriented programming. While that stuff seems like it has merit, I really don't have time to wrap my head around a whole new paradigm of OOP programming, or figuring out how to then use Spring/AspectJ to implement them.

So I'm just going to brute-force wrap all the timed invocations in a try-finally block. If someone else wants to take the time grok aspect-oriented programming in the context of AspectJ or Spring AOP, and create proper annotations, they are welcome to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I dug in a bit deeper too and it seems you are indeed right. That said, we should open an issue to revisit this, since it would help the RequestLog piece immensely to be able to be annotation-driven, rather than code-driven.

That said, we should make sure the timings are internal to the methods that are getting timed, since callers don't really need this complexity, and the method is better suited to handling it. (note: I've not looked at your updates yet, so this may already be the case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 100% that making the RequestLog annotation-based would be a Good Thing(TM).

I haven't actually pushed the timings into the methods. I'll do that now.

@@ -51,7 +51,12 @@ public void invoke(JsonNode rootNode) {
HttpErrorCallback error = response.getErrorCallback(druidQuery);
FailureCallback failure = response.getFailureCallback(druidQuery);

druidWebService.postDruidQuery(context, success, error, failure, druidQuery);
try {
RequestLog.startTiming("PostingDruidQuery");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timing should be moved into the actual druid client (and it may already be there). Also, I don't think it'll time what you want, since it's only going to time how long the request takes to send, not send and come back. (yay async!)

Copy link
Contributor Author

@archolewa archolewa Nov 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to time how long it takes to send (hence the name postingDruidQuery) it's an attempt to give us some idea of how much of the time we spend sending the druid query is spent serializing it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said, this should move into the methods doing the sending, rather than the caller. Also, the timer needs to take into account the sequence number of the query, I think, in order for log merging to work correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like something tricky enough to justify a DruidServlet and a test case to make sure that all the phases get tested.

@cdeszaq
Copy link
Collaborator

cdeszaq commented Nov 29, 2016 via email

@archolewa
Copy link
Contributor Author

archolewa commented Nov 29, 2016

I did look into how @Timed works. And everything I found says "You need AspectJ or Spring AOP to use @timed."

Furthermore, all the examples I found for writing custom annotations seem to assume you have your own main method where you can scan files for annotations and do something, rather than doing something special on method invocation.

Copy link
Collaborator

@cdeszaq cdeszaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more things 😄

@@ -562,7 +564,7 @@ private Query getFilterQuery(Set<ApiFilter> filters, int perPage) {
throw new PageNotFoundException(requestedPageNumber, perPage, 0);
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this whitespace


this.filterBuilder = bardConfigResources.getFilterBuilder();

MetricDictionary metricDictionary = bardConfigResources
.getMetricDictionary()
.getScope(Collections.singletonList(tableName));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see, this is just a straight refactor, no actual changes. got it.

@archolewa
Copy link
Contributor Author

#111 issue to make the RequestLog annotation-based.

@archolewa
Copy link
Contributor Author

I know. I died a little with every line I wrote. My higher-order timing function is still an option ;)

@cdeszaq cdeszaq added this to the v0.7.x milestone Dec 1, 2016
@michael-mclawhorn
Copy link
Contributor

What the status on this? Is the job error being pursued?

@archolewa
Copy link
Contributor Author

@michael-mclawhorn The status is "Oh crap the quarter's almost over and I have other things that need to get done!"

@archolewa archolewa force-pushed the add-timings branch 4 times, most recently from 8d5c155 to 0f4ddc1 Compare December 8, 2016 16:59
@archolewa
Copy link
Contributor Author

@michael-mclawhorn The status is now "I addressed the things."

@michael-mclawhorn
Copy link
Contributor

👍 Seems reasonable.

Copy link
Contributor

@michael-mclawhorn michael-mclawhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -37,7 +37,8 @@ class AsyncInvalidApiRequest extends AsyncFunctionalSpec {

@Override
Map<String, Closure<Void>> getResultAssertions() {
[ data: { assert GroovyTestUtils.compareJson(it.readEntity(String), EXPECTED_ERROR_MESSAGE) } ]
[ data: { assert GroovyTestUtils.compareErrorPayload(it.readEntity(String), EXPECTED_ERROR_MESSAGE) } ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What impact does this change have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to a previous change, error messages now have a requestId, which is a unique id. compareErrorPayload ignores the requestId when performing a comparison. I have no idea how this test got through those changes unchanged, but it started failing when I rebased my code onto master.


import com.fasterxml.jackson.annotation.JsonAutoDetect;

import com.yahoo.bard.webservice.druid.model.filter.Filter;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move up.

filterTypeCounter.put(
currentFilter.getClass().getSimpleName(),
filterTypeCounter.getOrDefault(currentFilter.getClass().getSimpleName(), 0L) + 1L
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be replaced with Map::merge:

filterTypeCounter.merge(currentFilter.getClass().getSimpleName(), 1L, (old, increment) -> old + increment);

filterStack.addAll(((MultiClauseFilter) currentFilter).getFields());
} else if (currentFilter instanceof NotFilter) {
filterStack.add(((NotFilter) currentFilter).getField());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This smells a little like extensibility pain... It would be great if Java made it easy to ask if things had various methods, but about the best we can do is add interfaces for specific methods. To that end, do we want to make HasFields and HasField interfaces, apply those interfaces to the MultiVlauseFilter and NotFilter (respectively), and then update this conditional to depend on those interfaces instead of these more concrete classes?

Specifically, it's the NotFilter dependency that drew my attention, since the MultiClauseFilter is abstract already and (roughly) intended to serve the same purpose as the interface would.

Copy link
Contributor Author

@archolewa archolewa Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to have that kind of interface, I'd much rather just have HasFields (Not can just return a singleton set). Though I'd prefer a better name than HasFields. I find that kind of convention hideously ugly in Java. Perhaps something like ComplexFilter? i.e. a Filter that's built out of other filters?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, Complex isn't specific enough. HasFilters is nice, in a way, because it fits (roughly) the bean convention of getFilters / setFilters roughly translating to the filters property present in Groovy and C# for example.

That said, I'm happy with ComplexFilter meaning "has a getFilters method". In part, I think we're fighting the generality of Druid's filters and their decision to use "field" to mean "another filter".

@cdeszaq
Copy link
Collaborator

cdeszaq commented Dec 8, 2016

Checkstyle needs fix... otherwise I think this is good to squash into logical commits and merge.

--A map is logged that describes the structure of the filter being sent
to druid. For each filter it includes a count of the number of instances
of that filter.
@archolewa archolewa merged commit 896fbc9 into master Dec 8, 2016
@archolewa archolewa deleted the add-timings branch December 8, 2016 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants