New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YCQL] Unable to use range query on a column which is clustered in the index but a partition column in the primary key #7069
Comments
A kludgy stop-gap workaround until we can roll out the fix/enhancement to support this, would be to have a duplicate column id2 whose contents are the same as id column. But the index is built on (scope, id2) instead of (scope, id).
|
Looks like the original query would work when "Partition column cannot be used in this expression" error is dropped from WhereExprState::AnalyzeColumnOp(). Maybe there is some consideration for prohibiting partition column from appearing in inequality conditions. |
I tried to refine the condition where the error is raised.
For the given example, sem_context->selecting_from_index() is false and the error is still raised. I will keep investigating. |
Looking at PTSelectStmt::Analyze(), this is the current order:
Effort so far has been on AnalyzeWhereClause(). However, the child select is determined later, in AnalyzeIndexes(). |
Summary: There are several issues with choosing the right INDEX scan when processing query. **(A) Issue Background** When analyzing SELECT, there should be three clear steps. 1. Analyze references: This step collect and validate the references to tables, columns, and operators. 2. Analyze scan plan: This step chooses an index to scan and save the scan-spec. 3. Analyze clauses: This step analyze clauses according to the chosen scan spec and prepare for protobuf-code generation. It is common practice in compiler to decorate the parse tree when processing step (1) and use the decorated structures to process steps (2) and (3). Unfortunately, YugaByte's existing implementation did not follow this design. It duplicates the parse tree for SELECT. One parse tree represents the outer select, and the other, the nested select. Then the two parse-trees are compiled together under the same context for the same SELECT command. This adds complexity and is the source of lots of bugs. Additionally, existing code chose not to process LIMIT and OFFSET clause for SELECT if it has nested INDEX query. This is a bug which is fixed in a different diff. #7055 **(B) Pseudo Code** The following pseudo code shows how the existing code is fixed with the two steps (1) and (2) above. We traverse the given tree to choose SCAN path first and then use the parse-tree duo structure for step (3). ``` Analyze(SelectNode) { // Traverse the entire SELECT node to collect references for table, column, clauses, expressions, operators. // Collect all necessary INDEX information into SelectScanInfo AnalyzeReferences (SelectNode, SelectScanInfo); // Analyze SelectScanInfo against the created indexes. for (each table_index) { scan_path = Analyze(SelectScanInfo, table_index) Append(scan_list, scan_path); } scan_spec = BestIndex(scan_list); // Duplicate SelectNode to create a nested node to query from chosen index. child_node = Duplicate(SelectNode); // Prepare for protobuf-code generation. FurtherAnalysis(child_node, nested_scan_path); FurtherAnalysis(parent_node, primary_read_path); } ``` **(C) Solution Notes** - This diff does not fix the duo of parse-tree issue because it'd be a lot of code changes. If CQL is extended to support more advance features, we will have to change the design. - This diff only fixed INDEX-selection step. That is, it analyzes just ONE parse-tree for index selection, chooses the correct INDEX, and then uses the existing code with the parse-tree-duo for the rest of the analysis. - Although the index-selection-process now traverses only ONE parse-tree, it continues to use the existing code for ORDER_BY analysis. To do so, the diff traverses the same ORDER_BY tree node one time for each INDEX (PRIMARY and SECONDARY) and check if that index can be a match. That is, it does not create extra structures to analyze ORDER BY clause as a normal compiler would do, it just uses YugaByte's existing code but in a different way. **(D) Notes on Implementation and Coding** The general idea is that the user's query ```SELECT <select_list> FROM <table> WHERE <user's primary key cond> AND <user's regular column cond>``` will be executed as ```SELECT <select-list> FROM <table> WHERE primary_key IN (SELECT primary_key FROM <index> WHERE <user's primary key cond>) AND <user's regular column cond>``` The nested index query will seek a PRIMARY KEY value, and for each PRIMARY KEY, the outer select will read one row from PRIMARY table. Coding - First, we analyze WHERE, IF, ORDER BY clauses to find the right INDEX and write the result to a scan-spec variable. - Later, we uses the chosen scan-spec to create a parse-tree duplicate as the existing code. - The rest of the code is kept the same. Some notes on coding - Class SelectScanInfo is added. It is used to collect all information that is needed for choosing an INDEX. - Class SelectScanSpec is added to represent the chosen scan path. - Filter-condition on PRIMARY KEY(hash, range) is voided in the outer SELECT because its condition is moved to the nested SELECT as described above. - Move "filtering_exprs_" attribute from PTDmlStmt to PTSelectStmt class because we only need this for processing SELECT. Therefore, class IfExprState is removed as its work is now in SELECT. Test Plan: Add TestIndexSelection.java Reviewers: zyu, oleg, mihnea, pjain Reviewed By: pjain Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D10912
…lecting optimal scan path Summary: There are several issues with choosing the right INDEX scan when processing query. **(A) Issue Background** When analyzing SELECT, there should be three clear steps. 1. Analyze references: This step collect and validate the references to tables, columns, and operators. 2. Analyze scan plan: This step chooses an index to scan and save the scan-spec. 3. Analyze clauses: This step analyze clauses according to the chosen scan spec and prepare for protobuf-code generation. It is common practice in compiler to decorate the parse tree when processing step (1) and use the decorated structures to process steps (2) and (3). Unfortunately, YugaByte's existing implementation did not follow this design. It duplicates the parse tree for SELECT. One parse tree represents the outer select, and the other, the nested select. Then the two parse-trees are compiled together under the same context for the same SELECT command. This adds complexity and is the source of lots of bugs. Additionally, existing code chose not to process LIMIT and OFFSET clause for SELECT if it has nested INDEX query. This is a bug which is fixed in a different diff. yugabyte#7055 **(B) Pseudo Code** The following pseudo code shows how the existing code is fixed with the two steps (1) and (2) above. We traverse the given tree to choose SCAN path first and then use the parse-tree duo structure for step (3). ``` Analyze(SelectNode) { // Traverse the entire SELECT node to collect references for table, column, clauses, expressions, operators. // Collect all necessary INDEX information into SelectScanInfo AnalyzeReferences (SelectNode, SelectScanInfo); // Analyze SelectScanInfo against the created indexes. for (each table_index) { scan_path = Analyze(SelectScanInfo, table_index) Append(scan_list, scan_path); } scan_spec = BestIndex(scan_list); // Duplicate SelectNode to create a nested node to query from chosen index. child_node = Duplicate(SelectNode); // Prepare for protobuf-code generation. FurtherAnalysis(child_node, nested_scan_path); FurtherAnalysis(parent_node, primary_read_path); } ``` **(C) Solution Notes** - This diff does not fix the duo of parse-tree issue because it'd be a lot of code changes. If CQL is extended to support more advance features, we will have to change the design. - This diff only fixed INDEX-selection step. That is, it analyzes just ONE parse-tree for index selection, chooses the correct INDEX, and then uses the existing code with the parse-tree-duo for the rest of the analysis. - Although the index-selection-process now traverses only ONE parse-tree, it continues to use the existing code for ORDER_BY analysis. To do so, the diff traverses the same ORDER_BY tree node one time for each INDEX (PRIMARY and SECONDARY) and check if that index can be a match. That is, it does not create extra structures to analyze ORDER BY clause as a normal compiler would do, it just uses YugaByte's existing code but in a different way. **(D) Notes on Implementation and Coding** The general idea is that the user's query ```SELECT <select_list> FROM <table> WHERE <user's primary key cond> AND <user's regular column cond>``` will be executed as ```SELECT <select-list> FROM <table> WHERE primary_key IN (SELECT primary_key FROM <index> WHERE <user's primary key cond>) AND <user's regular column cond>``` The nested index query will seek a PRIMARY KEY value, and for each PRIMARY KEY, the outer select will read one row from PRIMARY table. Coding - First, we analyze WHERE, IF, ORDER BY clauses to find the right INDEX and write the result to a scan-spec variable. - Later, we uses the chosen scan-spec to create a parse-tree duplicate as the existing code. - The rest of the code is kept the same. Some notes on coding - Class SelectScanInfo is added. It is used to collect all information that is needed for choosing an INDEX. - Class SelectScanSpec is added to represent the chosen scan path. - Filter-condition on PRIMARY KEY(hash, range) is voided in the outer SELECT because its condition is moved to the nested SELECT as described above. - Move "filtering_exprs_" attribute from PTDmlStmt to PTSelectStmt class because we only need this for processing SELECT. Therefore, class IfExprState is removed as its work is now in SELECT. Test Plan: Add TestIndexSelection.java Reviewers: zyu, oleg, mihnea, pjain Reviewed By: pjain Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D10912
Test Case:
Now, a query of this form is expected to use the index because
scope
is provided, and we should be able to do a range scan on theid
which is a clustering column in the index. However, we run into this error message:The text was updated successfully, but these errors were encountered: