RFC: Partial Data V2 #215

garyluoex · 2017-03-30T19:20:15Z

RFC: Partial Data V2

Feature Name: Partial Data V2
Start Date: 2017-04-21
RFC PR: TBD WIP

Introduction

When querying druid, druid will return any data within the requested intervals that are available while ignoring any missing data in the requested intervals. In many scenarios, this behavior is undesirable, the user is unaware of any missing data returned by druid since druid does not provide any indication of missing data. In order to setup a production level data reporting system, fili fills in the gap to notify users of any missing data using Partial Data V1 where fili will retrieve metadata regarding data availability in druid from the coordinator and behave differently depending on user's expectation of missing data.

Motivation

Recently, a bug was discovered in druid, where the brokers and the coordinators might be inconsistent for a short period of time in terms of data availability due to non-atomic rearranging of data segments between different historical nodes. Broker might not return data in a segment that is loaded in druid but is currently being moved to an other historical node. In this case, coordinator will indicate that the segment containing the requested data is available while Broker will return result that does not contain the corresponding requested data that is in the moving segment without any indication. This bug leads to Fili caching and reporting "bad" data such that result with missing data is returned even if the api user explicitly ask for data only if all data is present. Therefore, additional power is needed for Partial Data to handle this situation, which leads to the idea of Partial Data V2.

Method

In druid version 0.9.0 or later, druid implemented a feature that will return the missing intervals for a given query in the header of the query response from the Broker. Fili never took advantage of this feature since this feature is not documented and Partial Data V1 was believed to be sufficient. Partial Data V2 will take advantage of this feature in addition to the features supported in Partial Data V1 and validate what Fili expects from broker matches what the broker actually returned.

Below is an example of a druid query that requests broker to return the missing intervals:

Content-Type: application/json
{
    "queryType": "groupBy",
    "dataSource": "semiAvailableTable",
    "granularity": "day",
    "dimensions": [ "line_id" ],
    "aggregations": [ { "type": "longSum", "name": "myMetric", "fieldName": "myMetric" } ],
    "intervals": [ "2016-11-21/2017-12-19" ],
    "context": { "uncoveredIntervalsLimit": 10 }
}

Below is the header from the response given by druid from the above druid query:

200 OK
Date:  Mon, 10 Apr 2017 16:24:24 GMT
Content-Type:  application/json
X-Druid-Query-Id:  92c81bed-d9e6-4242-836b-0fcd1efdee9e
X-Druid-Response-Context: {
"uncoveredIntervals": [
    "2016-11-22T00:00:00.000Z/2016-12-18T00:00:00.000Z","2016-12-25T00:00:00.000Z/2017-
    01-03T00:00:00.000Z","2017-01-31T00:00:00.000Z/2017-02-01T00:00:00.000Z","2017-02-
    08T00:00:00.000Z/2017-02-09T00:00:00.000Z","2017-02-10T00:00:00.000Z/2017-02-
    13T00:00:00.000Z","2017-02-16T00:00:00.000Z/2017-02-20T00:00:00.000Z","2017-02-
    22T00:00:00.000Z/2017-02-25T00:00:00.000Z","2017-02-26T00:00:00.000Z/2017-03-
    01T00:00:00.000Z","2017-03-04T00:00:00.000Z/2017-03-05T00:00:00.000Z","2017-03-
    08T00:00:00.000Z/2017-03-09T00:00:00.000Z"
],
"uncoveredIntervalsOverflowed": true
}
Content-Encoding:  gzip
Vary:  Accept-Encoding, User-Agent
Transfer-Encoding:  chunked
Server:  Jetty(9.2.5.v20141112)

In the "context" section of the druid query, a property named "uncoveredIntervalsLimit" is set to let druid know that we want broker to return a list of intervals that are not present in the response shown in uncoveredIntervals property above. The number value 10 indicates to return the first 10 continuous uncovered interval in the header only and set the flag "uncoveredIntervalsOverflowed": true to indicate that there are more uncovered intervals in addition to the first 10 included.

Using the "uncoveredIntervals" header information provided by druid broker response, we can compare it to the missing intervals that fili expects from Partial Data V1. If "uncoveredIntervals" contains any interval that is not present in fili's expected missing interval list, we can send back an error response indicating the mismatch in data availability before the response is cached.

Implementation

The following design is proposed for Partial Data V2 in fili without causing any breaking changes or api change:

Miscellaneous Preparation
- Add new query context "uncoveredIntervalsLimit" into QueryContext for druid's uncovered interval feature
- Add a configurable property named druid_uncovered_interval_limit and default it to -1, comment negative means disable
- Add new response error messages as needed by Partial Data V2
Merge Druid Response Header into Druid Response Body Json Node in AsyncDruidWebServiceImplV2
- Implement a new AsyncDruidWebServiceImplV2 class that extends AsyncDruidWebServiceImpl which will override the sendRequest method with the following changes in addition to original content from parent class
- Retrieve "X-Druid-Response-Context" header from the druid response.
- Add both "X-Druid-Response-Context" parsed as JsonNode and the druid response body that is already parsed into JsonNode into a newly created ObjectNode
- Return the newly created ObjectNode as JsonNode
- In AbstractBinderFactory::buildDruidWebService, add a check for druid_uncovered_interval_limit greater than or equals to 0, if yes, use AsyncDruidWebServiceImplV2 else use the original one
Implement PartialDataV2ResponseProcessor implementing FullResponseProcessor
- Create a new FullResponseProcessor class that extends ResponseProcessor with nothing in it that this PartialDataV2ResponseProcessor implements
- Check response status code, if 304, invoke next response processor directly following the rules of the last bullet point in this section, if 200, do the following
- Extract uncoveredIntervalsOverflowed from X-Druid-Response-Context inside the JsonNode passed into PartialDataV2ResponseProcessor::processResponse, if it is true, invoke error response saying limit overflowed
- Extract uncoveredIntervals from X-Druid-Response-Contex inside the JsonNode passed into PartialDataV2ResponseProcessor::processResponse
- Parse both the uncoveredIntervals extracted above and allAvailableIntervals extracted from the union of all the query's datasource's availabilities from DataSourceMetadataService into SimplifiedIntervalLists
- Compare both SimplifiedIntervalLists above, if allAvailableIntervals has any overlap with uncoveredIntervals, invoke error response indicating druid is missing some data that are fili expects to exists.
- Otherwise, check if the next responseProcessor is a FullResponseProcessor or not, if yes, call the next responseProcessor with the same JsonNode as passed int, otherwise call the next response with the JsonNode being the response body JsonNode instead of the ObjectNode containing the extra "X-Druid_Response-Context"
Implement PartialDataV2RequestHandler implementing DataRequestHandler
- Add the "uncoveredIntervalsLimit: $druid_uncovered_interval_limit" context into DruidAggregationQuery passed into DataRequestHandler::druidQuery by calling DruidQuery::withContext
- Pass the above modified DruidQuery into the next request handler instead of the original druid query
- Append PartialDataV2ResponseProcessor to the current next ResponseProcessor chain
- Add PartialDataV2RequestHandler to DruidWorkflow between AsyncDruidRequestHandler and CacheV2RequestHandler and include a check for druid_uncovered_interval_limit is greater than or equals to 0

The text was updated successfully, but these errors were encountered:

cdeszaq · 2017-04-13T16:37:55Z

A few questions:

In which version of Druid is this available?
What is the "uncoveredIntervalsLimit": 10 in the context block of the query doing?
What is the structure of the X-Druid-Response-Context header?
- What do the relevant sections of that header mean?
Are details about how you're thinking this will get used or get hooked into Fili still being worked on?

cdeszaq · 2017-04-24T17:08:07Z

This looks pretty solid. 👍

cdeszaq · 2017-04-25T15:47:46Z

Some off-line design notes:

Note: Hooks up with Cache v3 as well.

QubitPi · 2017-05-04T17:40:31Z

👍 Good design and article

QubitPi · 2017-05-05T19:48:20Z

@garyluoex For the 3rd implementation, By FullResponseProcessor class do we mean FullResponseProcessor interface that extends ResponseProcessor interface?

garyluoex added WIP DESIGN labels Mar 30, 2017

cdeszaq assigned garyluoex Apr 13, 2017

garyluoex added REVIEWABLE and removed WIP labels Apr 21, 2017

garyluoex removed the REVIEWABLE label Apr 24, 2017

This was referenced May 5, 2017

Prepare For Partial Data V2 #264

Merged

Merge Header Info into JsonNode #267

Merged

This was referenced May 17, 2017

Implement DruidPartialDataResponseProcessor #275

Merged

Implement DruidPartialDataRequestHandler #287

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Partial Data V2 #215

RFC: Partial Data V2 #215

garyluoex commented Mar 30, 2017 •

edited

Loading

cdeszaq commented Apr 13, 2017

cdeszaq commented Apr 24, 2017

cdeszaq commented Apr 25, 2017

QubitPi commented May 4, 2017

QubitPi commented May 5, 2017

RFC: Partial Data V2 #215

RFC: Partial Data V2 #215

Comments

garyluoex commented Mar 30, 2017 • edited Loading

RFC: Partial Data V2

Introduction

Motivation

Method

Implementation

cdeszaq commented Apr 13, 2017

cdeszaq commented Apr 24, 2017

cdeszaq commented Apr 25, 2017

QubitPi commented May 4, 2017

QubitPi commented May 5, 2017

garyluoex commented Mar 30, 2017 •

edited

Loading