Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RowGrouping to fix Row Merging on MultiEngine Queries #447

Merged
merged 14 commits into from
May 9, 2019

Conversation

ryankwagner
Copy link
Collaborator

No description provided.

@patelh
Copy link
Collaborator

patelh commented Apr 30, 2019

Dim driven or fact driven?

@ryankwagner ryankwagner changed the title Add test to illustrate MultiEngineQuery issue. Add RowGrouping to fix Row Merging on MultiEngine Queries May 1, 2019
@ryankwagner
Copy link
Collaborator Author

This is on queries that merge a Druid fact + Oracle Dim query. Currently, RowList addRow uses only a single PrimaryKey alias for merging dim into fact. This causes any multi-value Dimension Rows coming from the Fact (Pricing Type) to collapse into 1 row. The fix is creating a RowGrouping object that contains both the fact-dim join key as its indexAlias, as well as any Fact-side Dimension Columns as "group by" keys to preserve the primary row list.

@patelh
Copy link
Collaborator

patelh commented May 1, 2019

You need to first define the use case which you are solving this for. Multi-engine queries are there for particular use cases. The behavior of multi-engine is dependent on fact driven vs dim driven queries so you need to be certain what you are solving.

val dimDrivenPartialRowList: Query => RowList = (q) => {
val indexAlias = q.queryContext.indexAliasOption.get
val groupByKeys: List[String] = q.queryContext.factGroupByKeys diff List(indexAlias)
new DimDrivenPartialRowList(RowGrouping(q.queryContext.indexAliasOption.get, groupByKeys), q)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the meaning of dim driven queries. Dim driven queries will always have only 1 row for each key from driving dim.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch, that actually saved me some debugging. :)

val dimDrivenFactOrderedPartialRowList: Query => RowList = (q) => {
val indexAlias = q.queryContext.indexAliasOption.get
val groupByKeys: List[String] = q.queryContext.factGroupByKeys diff List(indexAlias)
new DimDrivenFactOrderedPartialRowList(RowGrouping(q.queryContext.indexAliasOption.get, groupByKeys), q)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, dim driven implies single row per driving dim key.

@ryankwagner
Copy link
Collaborator Author

I have updated test cases to reflect the new logic. Right now, the join cases for any query using multiple engines then joining the results is incorrect. This adds columns to group by in order to prevent multivalue Dimension columns in fact from being lost.
The test cases in OracleQueryGenerator shows the fix, but current Query behavior is:
selectFields: ["Ad ID", "Ad Title", "Pricing Type"]
returnedRows from Fact: [1, null, "CPC"], [1, null "CPA"]
returnedRows from Dim: [1, "title", null]
Merged returned rows (current): [1, "title", "CPA"]
That is to say, the first of the two rows gets updated and CPC is lost. The new logic creates a grouping on Pricing Type so that we get 2 returned rows instead of 1, both having the correct information.

@@ -501,17 +501,23 @@ class DefaultQueryPipelineFactoryTest extends FunSuite with Matchers with Before
row.addValue("Ad Group ID", 202)
row.addValue("Impressions", 100)
row.addValue("Clicks", 1)
row.addValue("Pricing Type", "CPC")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create separate test for new functionality. Should be easy to copy/paste the parts you need.

@@ -76,6 +76,8 @@ class UnionViewRowListTest extends BaseOracleQueryGeneratorTest with BaseRowList
Map("Impressions" -> DecType(), "Spend" -> DecType()),
constAliasToValueMapList = List(Map("Advertiser ID" -> "12345"), Map("Advertiser ID" -> "12345", "Impressions" -> "1.0")))

assert(rowList.forall(row=>row.isInstanceOf[Row]))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing .forall

@@ -5295,7 +5296,7 @@ class OracleQueryGeneratorTest extends BaseOracleQueryGeneratorTest {
| FROM
| ( (SELECT advertiser_id, campaign_id, DECODE(status, 'ON', 'ON', 'OFF') AS "Ad Group Status", id
| FROM ad_group_oracle
| WHERE (advertiser_id = 213) AND (id IN (45,39,30,51,48,27,33,54,12,15,42,36,21,18,24,53,41,35,17,50,44,23,38,47,26,11,32,14,20,29,46,52,28,34,55,40,49,43,22,16,37,19,25,31,13))
| WHERE (advertiser_id = 213) AND (id IN (45,34,12,51,19,23,40,15,11,44,33,22,55,26,50,37,13,46,24,35,16,48,21,54,43,32,49,36,39,17,25,14,47,31,53,42,20,27,38,18,30,29,41,52,28))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order changes since rows are now stored in a different order, as rowGrouping modifies mapping positions vs indexAlias key.

@pranavbhole pranavbhole merged commit b7c4298 into yahoo:master May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants