-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Partial Data V2 #215
Labels
Comments
A few questions:
|
This looks pretty solid. 👍 |
Some off-line design notes: Note: Hooks up with Cache v3 as well. |
👍 Good design and article |
This was referenced May 5, 2017
@garyluoex For the 3rd implementation, By |
This was referenced May 17, 2017
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
RFC: Partial Data V2
Introduction
When querying druid, druid will return any data within the requested intervals that are available while ignoring any missing data in the requested intervals. In many scenarios, this behavior is undesirable, the user is unaware of any missing data returned by druid since druid does not provide any indication of missing data. In order to setup a production level data reporting system, fili fills in the gap to notify users of any missing data using Partial Data V1 where fili will retrieve metadata regarding data availability in druid from the coordinator and behave differently depending on user's expectation of missing data.
Motivation
Recently, a bug was discovered in druid, where the brokers and the coordinators might be inconsistent for a short period of time in terms of data availability due to non-atomic rearranging of data segments between different historical nodes. Broker might not return data in a segment that is loaded in druid but is currently being moved to an other historical node. In this case, coordinator will indicate that the segment containing the requested data is available while Broker will return result that does not contain the corresponding requested data that is in the moving segment without any indication. This bug leads to Fili caching and reporting "bad" data such that result with missing data is returned even if the api user explicitly ask for data only if all data is present. Therefore, additional power is needed for Partial Data to handle this situation, which leads to the idea of Partial Data V2.
Method
In druid version 0.9.0 or later, druid implemented a feature that will return the missing intervals for a given query in the header of the query response from the Broker. Fili never took advantage of this feature since this feature is not documented and Partial Data V1 was believed to be sufficient. Partial Data V2 will take advantage of this feature in addition to the features supported in Partial Data V1 and validate what Fili expects from broker matches what the broker actually returned.
Below is an example of a druid query that requests broker to return the missing intervals:
Below is the header from the response given by druid from the above druid query:
In the
"context"
section of the druid query, a property named"uncoveredIntervalsLimit"
is set to let druid know that we want broker to return a list of intervals that are not present in the response shown inuncoveredIntervals
property above. The number value10
indicates to return the first10
continuous uncovered interval in the header only and set the flag"uncoveredIntervalsOverflowed": true
to indicate that there are more uncovered intervals in addition to the first10
included.Using the
"uncoveredIntervals"
header information provided by druid broker response, we can compare it to the missing intervals that fili expects from Partial Data V1. If"uncoveredIntervals"
contains any interval that is not present in fili's expected missing interval list, we can send back an error response indicating the mismatch in data availability before the response is cached.Implementation
The following design is proposed for Partial Data V2 in fili without causing any breaking changes or api change:
Miscellaneous Preparation
QueryContext
for druid's uncovered interval featuredruid_uncovered_interval_limit
and default it to-1
, comment negative means disableMerge Druid Response Header into Druid Response Body Json Node in AsyncDruidWebServiceImplV2
AsyncDruidWebServiceImplV2
class that extendsAsyncDruidWebServiceImpl
which will override thesendRequest
method with the following changes in addition to original content from parent class"X-Druid-Response-Context"
header from the druid response."X-Druid-Response-Context"
parsed asJsonNode
and the druid response body that is already parsed intoJsonNode
into a newly createdObjectNode
ObjectNode
asJsonNode
AbstractBinderFactory::buildDruidWebService
, add a check fordruid_uncovered_interval_limit
greater than or equals to0
, if yes, useAsyncDruidWebServiceImplV2
else use the original oneImplement
PartialDataV2ResponseProcessor
implementing FullResponseProcessorFullResponseProcessor
class that extendsResponseProcessor
with nothing in it that thisPartialDataV2ResponseProcessor
implementsuncoveredIntervalsOverflowed
fromX-Druid-Response-Context
inside theJsonNode
passed intoPartialDataV2ResponseProcessor::processResponse
, if it is true, invoke error response saying limit overfloweduncoveredIntervals
fromX-Druid-Response-Contex
inside theJsonNode
passed intoPartialDataV2ResponseProcessor::processResponse
uncoveredIntervals
extracted above andallAvailableIntervals
extracted from the union of all the query's datasource's availabilities fromDataSourceMetadataService
intoSimplifiedIntervalList
sSimplifiedIntervalList
s above, ifallAvailableIntervals
has any overlap withuncoveredIntervals
, invoke error response indicating druid is missing some data that are fili expects to exists.FullResponseProcessor
or not, if yes, call the next responseProcessor with the sameJsonNode
as passed int, otherwise call the next response with theJsonNode
being the response bodyJsonNode
instead of the ObjectNode containing the extra"X-Druid_Response-Context"
Implement
PartialDataV2RequestHandler
implementing DataRequestHandlerDruidAggregationQuery
passed intoDataRequestHandler::druidQuery
by callingDruidQuery::withContext
DruidQuery
into the next request handler instead of the original druid queryPartialDataV2ResponseProcessor
to the current nextResponseProcessor
chainPartialDataV2RequestHandler
toDruidWorkflow
betweenAsyncDruidRequestHandler
andCacheV2RequestHandler
and include a check fordruid_uncovered_interval_limit
is greater than or equals to0
The text was updated successfully, but these errors were encountered: