-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RowGrouping to fix Row Merging on MultiEngine Queries #447
Conversation
Dim driven or fact driven? |
…ed in OracleQueryGeneratorTest.
This is on queries that merge a Druid fact + Oracle Dim query. Currently, RowList addRow uses only a single PrimaryKey alias for merging dim into fact. This causes any multi-value Dimension Rows coming from the Fact (Pricing Type) to collapse into 1 row. The fix is creating a RowGrouping object that contains both the fact-dim join key as its indexAlias, as well as any Fact-side Dimension Columns as "group by" keys to preserve the primary row list. |
You need to first define the use case which you are solving this for. Multi-engine queries are there for particular use cases. The behavior of multi-engine is dependent on fact driven vs dim driven queries so you need to be certain what you are solving. |
val dimDrivenPartialRowList: Query => RowList = (q) => { | ||
val indexAlias = q.queryContext.indexAliasOption.get | ||
val groupByKeys: List[String] = q.queryContext.factGroupByKeys diff List(indexAlias) | ||
new DimDrivenPartialRowList(RowGrouping(q.queryContext.indexAliasOption.get, groupByKeys), q) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the meaning of dim driven queries. Dim driven queries will always have only 1 row for each key from driving dim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch, that actually saved me some debugging. :)
val dimDrivenFactOrderedPartialRowList: Query => RowList = (q) => { | ||
val indexAlias = q.queryContext.indexAliasOption.get | ||
val groupByKeys: List[String] = q.queryContext.factGroupByKeys diff List(indexAlias) | ||
new DimDrivenFactOrderedPartialRowList(RowGrouping(q.queryContext.indexAliasOption.get, groupByKeys), q)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, dim driven implies single row per driving dim key.
I have updated test cases to reflect the new logic. Right now, the join cases for any query using multiple engines then joining the results is incorrect. This adds columns to group by in order to prevent multivalue Dimension columns in fact from being lost. |
@@ -501,17 +501,23 @@ class DefaultQueryPipelineFactoryTest extends FunSuite with Matchers with Before | |||
row.addValue("Ad Group ID", 202) | |||
row.addValue("Impressions", 100) | |||
row.addValue("Clicks", 1) | |||
row.addValue("Pricing Type", "CPC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create separate test for new functionality. Should be easy to copy/paste the parts you need.
core/src/test/scala/com/yahoo/maha/core/query/DefaultQueryPipelineFactoryTest.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/com/yahoo/maha/core/query/DefaultQueryPipelineFactoryTest.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/yahoo/maha/core/query/QueryPipeline.scala
Outdated
Show resolved
Hide resolved
@@ -76,6 +76,8 @@ class UnionViewRowListTest extends BaseOracleQueryGeneratorTest with BaseRowList | |||
Map("Impressions" -> DecType(), "Spend" -> DecType()), | |||
constAliasToValueMapList = List(Map("Advertiser ID" -> "12345"), Map("Advertiser ID" -> "12345", "Impressions" -> "1.0"))) | |||
|
|||
assert(rowList.forall(row=>row.isInstanceOf[Row])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing .forall
@@ -5295,7 +5296,7 @@ class OracleQueryGeneratorTest extends BaseOracleQueryGeneratorTest { | |||
| FROM | |||
| ( (SELECT advertiser_id, campaign_id, DECODE(status, 'ON', 'ON', 'OFF') AS "Ad Group Status", id | |||
| FROM ad_group_oracle | |||
| WHERE (advertiser_id = 213) AND (id IN (45,39,30,51,48,27,33,54,12,15,42,36,21,18,24,53,41,35,17,50,44,23,38,47,26,11,32,14,20,29,46,52,28,34,55,40,49,43,22,16,37,19,25,31,13)) | |||
| WHERE (advertiser_id = 213) AND (id IN (45,34,12,51,19,23,40,15,11,44,33,22,55,26,50,37,13,46,24,35,16,48,21,54,43,32,49,36,39,17,25,14,47,31,53,42,20,27,38,18,30,29,41,52,28)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
order changes since rows are now stored in a different order, as rowGrouping modifies mapping positions vs indexAlias key.
No description provided.