[#6821 #7344 #7069] (YCQL) Fix issues when selecting optimal scan path

Summary: There are several issues with choosing the right INDEX scan when processing query. **(A) Issue Background** When analyzing SELECT, there should be three clear steps. 1. Analyze references: This step collect and validate the references to tables, columns, and operators. 2. Analyze scan plan: This step chooses an index to scan and save the scan-spec. 3. Analyze clauses: This step analyze clauses according to the chosen scan spec and prepare for protobuf-code generation. It is common practice in compiler to decorate the parse tree when processing step (1) and use the decorated structures to process steps (2) and (3). Unfortunately, YugaByte's existing implementation did not follow this design. It duplicates the parse tree for SELECT. One parse tree represents the outer select, and the other, the nested select. Then the two parse-trees are compiled together under the same context for the same SELECT command. This adds complexity and is the source of lots of bugs. Additionally, existing code chose not to process LIMIT and OFFSET clause for SELECT if it has nested INDEX query. This is a bug which is fixed in a different diff. #7055 **(B) Pseudo Code** The following pseudo code shows how the existing code is fixed with the two steps (1) and (2) above. We traverse the given tree to choose SCAN path first and then use the parse-tree duo structure for step (3). ``` Analyze(SelectNode) { // Traverse the entire SELECT node to collect references for table, column, clauses, expressions, operators. // Collect all necessary INDEX information into SelectScanInfo AnalyzeReferences (SelectNode, SelectScanInfo); // Analyze SelectScanInfo against the created indexes. for (each table_index) { scan_path = Analyze(SelectScanInfo, table_index) Append(scan_list, scan_path); } scan_spec = BestIndex(scan_list); // Duplicate SelectNode to create a nested node to query from chosen index. child_node = Duplicate(SelectNode); // Prepare for protobuf-code generation. FurtherAnalysis(child_node, nested_scan_path); FurtherAnalysis(parent_node, primary_read_path); } ``` **(C) Solution Notes** - This diff does not fix the duo of parse-tree issue because it'd be a lot of code changes. If CQL is extended to support more advance features, we will have to change the design. - This diff only fixed INDEX-selection step. That is, it analyzes just ONE parse-tree for index selection, chooses the correct INDEX, and then uses the existing code with the parse-tree-duo for the rest of the analysis. - Although the index-selection-process now traverses only ONE parse-tree, it continues to use the existing code for ORDER_BY analysis. To do so, the diff traverses the same ORDER_BY tree node one time for each INDEX (PRIMARY and SECONDARY) and check if that index can be a match. That is, it does not create extra structures to analyze ORDER BY clause as a normal compiler would do, it just uses YugaByte's existing code but in a different way. **(D) Notes on Implementation and Coding** The general idea is that the user's query ```SELECT <select_list> FROM <table> WHERE <user's primary key cond> AND <user's regular column cond>``` will be executed as ```SELECT <select-list> FROM <table> WHERE primary_key IN (SELECT primary_key FROM <index> WHERE <user's primary key cond>) AND <user's regular column cond>``` The nested index query will seek a PRIMARY KEY value, and for each PRIMARY KEY, the outer select will read one row from PRIMARY table. Coding - First, we analyze WHERE, IF, ORDER BY clauses to find the right INDEX and write the result to a scan-spec variable. - Later, we uses the chosen scan-spec to create a parse-tree duplicate as the existing code. - The rest of the code is kept the same. Some notes on coding - Class SelectScanInfo is added. It is used to collect all information that is needed for choosing an INDEX. - Class SelectScanSpec is added to represent the chosen scan path. - Filter-condition on PRIMARY KEY(hash, range) is voided in the outer SELECT because its condition is moved to the nested SELECT as described above. - Move "filtering_exprs_" attribute from PTDmlStmt to PTSelectStmt class because we only need this for processing SELECT. Therefore, class IfExprState is removed as its work is now in SELECT. Test Plan: Add TestIndexSelection.java Reviewers: zyu, oleg, mihnea, pjain Reviewed By: pjain Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D10912
yugabyte · Apr 1, 2021 · fb3f504 · fb3f504
1 parent 714a51e
commit fb3f504
Show file tree

Hide file tree

Showing 11 changed files with 1,136 additions and 331 deletions.
diff --git a/java/yb-cql/src/test/java/org/yb/cql/TestIndexSelection.java b/java/yb-cql/src/test/java/org/yb/cql/TestIndexSelection.java
@@ -0,0 +1,268 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * The following only applies to changes made to this file as part of YugaByte development.
+ *
+ *     Portions Copyright (c) YugaByte, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file
+ * except in compliance with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software distributed under the
+ * License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
+ * either express or implied.  See the License for the specific language governing permissions
+ * and limitations under the License.
+ */
+package org.yb.cql;
+
+import java.util.*;
+
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import com.datastax.driver.core.BatchStatement;
+import com.datastax.driver.core.BoundStatement;
+import com.datastax.driver.core.ColumnDefinitions;
+import com.datastax.driver.core.ColumnDefinitions.Definition;
+import com.datastax.driver.core.PreparedStatement;
+import com.datastax.driver.core.ResultSet;
+import com.datastax.driver.core.ResultSetFuture;
+import com.datastax.driver.core.Row;
+import com.datastax.driver.core.Session;
+import com.datastax.driver.core.SimpleStatement;
+import com.datastax.driver.core.TableMetadata;
+import com.datastax.driver.core.exceptions.InvalidQueryException;
+
+import org.yb.minicluster.BaseMiniClusterTest;
+import org.yb.minicluster.MiniYBCluster;
+import org.yb.minicluster.RocksDBMetrics;
+import org.yb.util.SanitizerUtil;
+import org.yb.util.TableProperties;
+
+import static org.yb.AssertionWrappers.assertEquals;
+import static org.yb.AssertionWrappers.assertFalse;
+import static org.yb.AssertionWrappers.assertNotNull;
+import static org.yb.AssertionWrappers.assertNull;
+import static org.yb.AssertionWrappers.assertTrue;
+import static org.yb.AssertionWrappers.fail;
+
+import org.yb.YBTestRunner;
+
+import org.junit.runner.RunWith;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@RunWith(value=YBTestRunner.class)
+public class TestIndexSelection extends BaseCQLTest {
+  private static final Logger LOG = LoggerFactory.getLogger(TestIndexSelection.class);
+
+  @Override
+  public int getTestMethodTimeoutSec() {
+    // Usual time for a test ~90 seconds. But can be much more on Jenkins.
+    return super.getTestMethodTimeoutSec()*10;
+  }
+
+  @BeforeClass
+  public static void SetUpBeforeClass() throws Exception {
+    BaseMiniClusterTest.tserverArgs.add("--allow_index_table_read_write");
+    BaseMiniClusterTest.tserverArgs.add(
+        "--index_backfill_upperbound_for_user_enforced_txn_duration_ms=1000");
+    BaseMiniClusterTest.tserverArgs.add(
+        "--index_backfill_wait_for_old_txns_ms=100");
+    BaseCQLTest.setUpBeforeClass();
+  }
+
+  @Before
+  public void setUpTest() throws Exception {
+    createKeyspace("yugatest");
+    useKeyspace("yugatest");
+  }
+
+  // Test issue where PRIMARY INDEX should be chosen over SECONDARY INDEX, but it is not.
+  @Test
+  public void testPrimaryIndexScan() throws Exception {
+    //----------------------------------------------------------------------------------------------
+    // Test for single read.
+    // Table.
+    session.execute("CREATE TABLE test_primary_index" +
+                    "  (i INT, j INT, x INT, y INT, PRIMARY KEY (i))" +
+                    "  WITH TRANSACTIONS = {'enabled' : true};");
+
+    // Index.
+    session.execute("CREATE INDEX second_idx ON test_primary_index(x);");
+
+    // Insert data.
+    session.execute("INSERT INTO test_primary_index(i, j, x, y) VALUES(1, 1, 3, 3);");
+    session.execute("INSERT INTO test_primary_index(i, j, x, y) VALUES(2, 2, 2, 2);");
+    session.execute("INSERT INTO test_primary_index(i, j, x, y) VALUES(3, 3, 1, 1);");
+
+    // Primary lookup should be chosen.
+    assertQuery("EXPLAIN SELECT * FROM test_primary_index WHERE i = 1 AND x = 1;",
+                "Row[Primary Key Lookup on yugatest.test_primary_index]" +
+                "Row[  Key Conditions: (i = 1)                        ]" +
+                "Row[  Filter: (x = 1)                                ]");
+
+    assertQuery("SELECT * FROM test_primary_index WHERE i = 1 AND x = 1;", "");
+    assertQuery("SELECT * FROM test_primary_index WHERE i = 2 AND x = 2;", "Row[2, 2, 2, 2]");
+
+    //----------------------------------------------------------------------------------------------
+    // Test for range scan.
+    // Table
+    if (false) {
+      // TODO(neil) Comment this out for now.
+      // Currently, there are tests that expect SECONDARY INDEX to be chosen over PRIMARY.
+      // I don't yet understand their logic, so this test is commented out for now.
+      session.execute("CREATE TABLE test_primary_range" +
+                      "  (i INT, j INT, x INT, y INT, v1 INT, v2 INT, PRIMARY KEY (i, j))" +
+                      "  WITH TRANSACTIONS = {'enabled' : true};");
+
+      // Index.
+      session.execute("CREATE INDEX second_range ON test_primary_range(x, y);");
+
+      // Insert data.
+      session.execute("INSERT INTO test_primary_range(i, j, x, y, v1, v2)" +
+                      "  VALUES(1, 1, 1, 3, 101, 103);");
+      session.execute("INSERT INTO test_primary_range(i, j, x, y, v1, v2)" +
+                      "  VALUES(1, 2, 1, 2, 102, 102);");
+      session.execute("INSERT INTO test_primary_range(i, j, x, y, v1, v2)" +
+                      "   VALUES(1, 3, 1, 1, 103, 101);");
+
+      // Primary range scan should be chosen.
+      assertQuery("EXPLAIN SELECT * FROM test_primary_range WHERE i = 1 AND x = 1;",
+                  "Row[Range Scan on yugatest.test_primary_range]" +
+                  "Row[  Key Conditions: (i = 1)                ]" +
+                  "Row[  Filter: (x = 1)                        ]");
+
+      assertQuery("SELECT * FROM test_primary_range WHERE i = 1 AND x = 1 LIMIT 2;",
+                  "Row[1, 1, 1, 3, 101, 103]" +
+                  "Row[1, 2, 1, 2, 102, 102]");
+
+      assertQuery("SELECT * FROM test_primary_range WHERE x = 1 LIMIT 2;",
+                  "Row[1, 3, 1, 1, 103, 101]" +
+                  "Row[1, 2, 1, 2, 102, 102]");
+    }
+  }
+
+  // Test issues where PRIMARY KEY condition was wrongly enforced including github #7069.
+  @Test
+  public void testSecondaryIndexScan() throws Exception {
+    // Table.
+    session.execute("CREATE TABLE test_secondary_index" +
+                    "  ( h1 INT, h2 INT, r INT, i1 INT, i2 INT, j INT, val INT," +
+                    "    PRIMARY KEY ((h1, h2), r) )" +
+                    "  WITH transactions = {'enabled' : true};");
+    // Index.
+    session.execute("CREATE INDEX second_idx ON test_secondary_index((i1, i2), j);");
+
+    // Insert data.
+    session.execute("INSERT INTO test_secondary_index(h1, h2, r, i1, i2, j, val)" +
+                    "  VALUES (1, 1, 1, 1, 1, 1, 101);");
+    session.execute("INSERT INTO test_secondary_index(h1, h2, r, i1, i2, j, val)" +
+                    "  VALUES (1, 2, 2, 1, 2, 2, 102);");
+    session.execute("INSERT INTO test_secondary_index(h1, h2, r, i1, i2, j, val)" +
+                    "  VALUES (1, 3, 3, 1, 3, 3, 103);");
+    session.execute("INSERT INTO test_secondary_index(h1, h2, r, i1, i2, j, val)" +
+                    "  VALUES (2, 1, 4, 2, 1, 4, 104);");
+    session.execute("INSERT INTO test_secondary_index(h1, h2, r, i1, i2, j, val)" +
+                    "  VALUES (2, 2, 5, 2, 2, 5, 105);");
+    session.execute("INSERT INTO test_secondary_index(h1, h2, r, i1, i2, j, val)" +
+                    "  VALUES (2, 3, 6, 2, 3, 6, 106);");
+    session.execute("INSERT INTO test_secondary_index(h1, h2, r, i1, i2, j, val)" +
+                    "  VALUES (3, 1, 7, 3, 1, 7, 107);");
+    session.execute("INSERT INTO test_secondary_index(h1, h2, r, i1, i2, j, val)" +
+                    "  VALUES (3, 2, 8, 3, 2, 8, 108);");
+    session.execute("INSERT INTO test_secondary_index(h1, h2, r, i1, i2, j, val)" +
+                    "  VALUES (3, 3, 9, 3, 3, 9, 109);");
+
+    // Make sure secondary index is chosen.
+    assertQuery("EXPLAIN SELECT * FROM test_secondary_index" +
+                "  WHERE h1 IN (1, 2, 3) AND i1 IN (1, 2, 3) AND i2 = 2 AND r > 0 AND val < 110;",
+                "Row[Index Scan using yugatest.second_idx on yugatest.test_secondary_index]" +
+                "Row[  Key Conditions: (i1 IN expr) AND (i2 = 2)                          ]" +
+                "Row[  Filter: (h1 IN expr) AND (r > 0)                                   ]");
+
+    // Check that dataset is correct.
+    assertQuery("SELECT * FROM test_secondary_index" +
+                "  WHERE h1 = 1 AND i1 IN (1, 2, 3) AND i2 = 2 AND r > 0 AND val < 110;",
+                "Row[1, 2, 2, 1, 2, 2, 102]");
+    assertQuery("SELECT * FROM test_secondary_index" +
+                "  WHERE h1 > 3 AND i1 IN (1, 2, 3) AND i2 = 2 AND r > 0 AND val < 110;",
+                "");
+    assertQuery("SELECT * FROM test_secondary_index" +
+                "  WHERE h1 < 5 AND i1 IN (1, 2, 3) AND i2 = 2 AND r > 0 AND val < 110;",
+                "Row[1, 2, 2, 1, 2, 2, 102]" +
+                "Row[2, 2, 5, 2, 2, 5, 105]" +
+                "Row[3, 2, 8, 3, 2, 8, 108]");
+
+    assertQuery("SELECT * FROM test_secondary_index" +
+                "  WHERE h1 IN (1, 2, 3) AND i1 IN (1, 2, 3) AND i2 = 2 AND r > 3 AND val < 110;",
+                "Row[2, 2, 5, 2, 2, 5, 105]" +
+                "Row[3, 2, 8, 3, 2, 8, 108]");
+    assertQuery("SELECT * FROM test_secondary_index" +
+                "  WHERE h1 IN (1, 2, 3) AND i1 IN (1, 2, 3) AND i2 = 2 AND r > 6 AND val < 110;",
+                "Row[3, 2, 8, 3, 2, 8, 108]");
+    assertQuery("SELECT * FROM test_secondary_index" +
+                "  WHERE h1 IN (1, 2, 3) AND i1 IN (1, 2, 3) AND i2 = 2 AND r > 9 AND val < 110;",
+                "");
+
+    assertQuery("SELECT * FROM test_secondary_index" +
+                "  WHERE h1 IN (1, 2, 3) AND i1 IN (1, 2, 3) AND i2 = 2 AND r > 0 AND val > 102;",
+                "Row[2, 2, 5, 2, 2, 5, 105]" +
+                "Row[3, 2, 8, 3, 2, 8, 108]");
+    assertQuery("SELECT * FROM test_secondary_index" +
+                "  WHERE h1 IN (1, 2, 3) AND i1 IN (1, 2, 3) AND i2 = 2 AND r > 0 AND val < 105;",
+                "Row[1, 2, 2, 1, 2, 2, 102]");
+  }
+
+  @Test
+  public void testOrderByIndexScan() throws Exception {
+    // Table.
+    session.execute("CREATE TABLE t_orderby_scan (i INT, j INT, m INT, n INT, PRIMARY KEY (i))" +
+                    "  WITH TRANSACTIONS = {'enabled' : true};");
+
+    // Indexes.
+    session.execute("CREATE INDEX jdx1 ON t_orderby_scan (j, m);");
+    session.execute("CREATE INDEX jdx2 ON t_orderby_scan (j, n);");
+
+    // Insert data.
+    session.execute("INSERT INTO t_orderby_scan(i, j, m, n) VALUES (1, 1, 1, 4);");
+    session.execute("INSERT INTO t_orderby_scan(i, j, m, n) VALUES (2, 1, 2, 3);");
+    session.execute("INSERT INTO t_orderby_scan(i, j, m, n) VALUES (3, 1, 3, 2);");
+    session.execute("INSERT INTO t_orderby_scan(i, j, m, n) VALUES (4, 1, 4, 1);");
+
+    // Index "jdx1" should be chosen.
+    assertQuery("EXPLAIN SELECT * FROM t_orderby_scan WHERE j = 1 order by m;",
+                "Row[Index Scan using yugatest.jdx1 on yugatest.t_orderby_scan]" +
+                "Row[  Key Conditions: (j = 1)                                ]");
+
+    assertQuery("SELECT * FROM t_orderby_scan WHERE j = 1 ORDER BY m LIMIT 2;",
+                "Row[1, 1, 1, 4]" +
+                "Row[2, 1, 2, 3]");
+
+    // Index "jdx2" should be chosen.
+    assertQuery("EXPLAIN SELECT * FROM t_orderby_scan WHERE j = 1 order by n;",
+                "Row[Index Scan using yugatest.jdx2 on yugatest.t_orderby_scan]" +
+                "Row[  Key Conditions: (j = 1)                                ]");
+
+    assertQuery("SELECT * FROM t_orderby_scan WHERE j = 1 order by n LIMIT 2;",
+                "Row[4, 1, 4, 1]" +
+                "Row[3, 1, 3, 2]");
+  }
+}
diff --git a/src/yb/yql/cql/ql/ptree/parse_tree.cc b/src/yb/yql/cql/ql/ptree/parse_tree.cc
@@ -53,6 +53,10 @@ CHECKED_STATUS ParseTree::Analyze(SemContext *sem_context) {
     return Status::OK();
   }
 
+  // Each analysis process needs to have state variables.
+  // Setup a base sem_state variable before traversing the statement tree.
+  SemState sem_state(sem_context);
+
   DCHECK_EQ(root_->opcode(), TreeNodeOpcode::kPTListNode) << "statement list expected";
   const auto lnode = std::static_pointer_cast<PTListNode>(root_);
   switch (lnode->size()) {

diff --git a/src/yb/yql/cql/ql/ptree/pt_dml.cc b/src/yb/yql/cql/ql/ptree/pt_dml.cc
@@ -56,8 +56,7 @@ PTDmlStmt::PTDmlStmt(MemoryContext *memctx,
       column_refs_(memctx),
       static_column_refs_(memctx),
       pk_only_indexes_(memctx),
-      non_pk_only_indexes_(memctx),
-      filtering_exprs_(memctx) {
+      non_pk_only_indexes_(memctx) {
 }
 
 // Clone a DML tnode for re-analysis. Only the syntactic information populated by the parser should
@@ -82,8 +81,7 @@ PTDmlStmt::PTDmlStmt(MemoryContext *memctx, const PTDmlStmt& other, bool copy_if
       column_refs_(memctx),
       static_column_refs_(memctx),
       pk_only_indexes_(memctx),
-      non_pk_only_indexes_(memctx),
-      filtering_exprs_(memctx) {
+      non_pk_only_indexes_(memctx) {
 }
 
 PTDmlStmt::~PTDmlStmt() {
@@ -263,8 +261,7 @@ Status PTDmlStmt::AnalyzeWhereExpr(SemContext *sem_context, PTExpr *expr) {
   ColumnOpCounter partition_key_counter;
   WhereExprState where_state(&where_ops_, &key_where_ops_, &subscripted_col_where_ops_,
                              &json_col_where_ops_, &partition_key_ops_, &op_counters,
-                             &partition_key_counter, opcode(), &func_ops_,
-                             &filtering_exprs_);
+                             &partition_key_counter, opcode(), &func_ops_);
 
   SemState sem_state(sem_context, QLType::Create(BOOL), InternalType::kBoolValue);
   sem_state.SetWhereState(&where_state);
@@ -346,10 +343,8 @@ Status PTDmlStmt::AnalyzeWhereExpr(SemContext *sem_context, PTExpr *expr) {
 
 Status PTDmlStmt::AnalyzeIfClause(SemContext *sem_context) {
   if (if_clause_) {
-    IfExprState if_state(&filtering_exprs_);
     SemState sem_state(sem_context, QLType::Create(BOOL), InternalType::kBoolValue);
     sem_state.set_processing_if_clause(true);
-    sem_state.SetIfState(&if_state);
     return if_clause_->Analyze(sem_context);
   }
   return Status::OK();
@@ -482,13 +477,15 @@ Status WhereExprState::AnalyzeColumnOp(SemContext *sem_context,
                                        const ColumnDesc *col_desc,
                                        PTExpr::SharedPtr value,
                                        PTExprListNode::SharedPtr col_args) {
-  // Collecting all filtering expressions to help choosing INDEX when processing a DML.
-  filtering_exprs_->push_back(expr);
-
   // If this is a nested select from an uncovered index, ignore column that is uncovered.
   if (col_desc == nullptr && sem_context->IsUncoveredIndexSelect()) {
     return Status::OK();
   }
+  if (col_desc->is_primary() && sem_context->void_primary_key_condition()) {
+    // Drop the key condition from where clause as instructed.
+    return Status::OK();
+  }
+
   ColumnOpCounter& counter = op_counters_->at(col_desc->index());
   switch (expr->ql_op()) {
     case QL_OP_EQUAL: {
@@ -536,8 +533,8 @@ Status WhereExprState::AnalyzeColumnOp(SemContext *sem_context,
     case QL_OP_LESS_THAN_EQUAL: FALLTHROUGH_INTENDED;
     case QL_OP_GREATER_THAN_EQUAL: FALLTHROUGH_INTENDED;
     case QL_OP_GREATER_THAN: {
-
       // Inequality conditions on hash columns are not allowed.
+      // - Ignore the error if key condition is not needed.
       if (col_desc->is_hash()) {
         return sem_context->Error(expr, "Partition column cannot be used in this expression",
             ErrorCode::CQL_STATEMENT_INVALID);