Skip to content

Commit

Permalink
[YSQL] #1126 Row-level partitioning support in YSQL (#5179)
Browse files Browse the repository at this point in the history
Summary:

Enable explicit table-partitioning (i.e. using the `PARTITION BY` clause) in YSQL.
See this architecture doc:  https://github.com/yugabyte/yugabyte-db/blob/master/architecture/design/ysql-row-level-partitioning.md for more details.

Test Plan:
Adds new test: org.yb.pgsql.TestPgRegressPartitions containing the following regress tests:
  yb_pg_partition_prune
  yb_pg_partition_join
  yb_pg_partition_aggregate
  yb_pg_partition_update

Individual changes and fixes are detailed below:

* removed all the modifications made to the grammar that disable any partitions related syntax.

* Fixed a bug where StoreCatalogInheritance was called twice during table creation. 
This causes a bug in the following manner: i) Child table is created on the heap, but no partition information about it has been updated in the catalog yet. ii) StoreCatalogInheritance is called, creating an entry for it in pg_inherits, but no information about its partition bounds is stored in pg_class iii) When StorePartitionBound is called, postgres tries to check whether the parent table has any child with an overlapping range iv) POstgres sees the newly created child table in pg_inherits v) However, when it tries to look up its bounds information in the catalog, there is understandably no bounds information for the new table (as we haven't updated it yet). An assertion fires, stating that there is a child table with no bounds information.

To fix this issue, removed the extraneous StoreCatalogInheritance call.
Also noticed that postgres 11.2 source code actually does not have
this extraneous call.

* YBPreloadRelCache fix

Today YBPreloadRelCache during initialization, has two phases:
1) Scan pg_class table
2) Scan pg_attribute table
This means the first scan populates some relations into the cache
without their corresponding attribute information. This is usually
okay, but this breaks once partitions are enabled.
This is because, in the first phase, there was a special handling
for partitioned tables that tried to update the partition key and
bounds information. This requires the attribute info for
pg_partitioned_table and pg_type catalog tables. However, since we
were still in the first phase of YBPreloadRelCache, we will hit
assertion here as these catalog tables will be present in the cache
without any attribute information.
To fix this, this patch does not update the partition information
in the first phase. Since we need the attribute info populated for
multiple catalog tables, and we cannot be sure whether these info
were all populated when we try to process the partitioned table
in the second phase, this patch creates a third phase.
In the third phase, we scan the pg_partitioned_table, and update the
cache for any partitioned tables with partition information.

* For queries on partitioned tables, when indices were created on the tables, any join queries where failing.

This is because postgres was picking a strategy MergeAppend
to combine values from multiple child tables. MergeAppend
expects the input streams to be merged to be sorted.
However, in YB, by default, any single column indices are
hash based and unordered. Still, MergeAppend was treating
the values returned by IndexScans as ordered.

This is because we try to pick hash index if there is an equality
comparison in the equivalence class. However, the corresponding pathkey
is created with strategy BTLessStrategyNumber or
BTGreaterStrategyNumber always depending on the value passed
for reverse_sort. Thus, when hash indices were picked, the merge
append would assume that the output returned by this pathkey
was already sorted appropriately.

Fixed this by ensuring that when we pick a hash index, the
corresponding pathkey should always have strategy BTEqualStrategyNumber.
This way, the output from this pathkey is not considered to be
ordered, and partition queries start returning correct results.

* Add tests taken from partition_join.sql and partition_aggregate.sql in two new files yb_partition_join.sql and yb_partition_aggregate.sql. These files contain the same queries as the original postgres tests, but queries of the form "EXPLAIN.." and "ALTER TABLE" have been omitted as they are expected to fail. All other queries testing partition aggregates and partition joins work as expected and are included in the new files.

* For partitioned tables, when the partition key of a row is updated such that the partition constraint is no longer satisfied, PostgreSQL converts the operation into 2 operations - a DELETE from the current partition, and an INSERT into the new partition, however this is not a transaction. The check for row movement, and conversion into a DELETE+INSERT was disabled for Yugabyte tables, so the update was returning success without actually doing anything. In this patch, returning failure if partition constraints are being violated. In a later patch, this row movement will be performed as a transaction between the two partition tables. Additionally there were a few bugs in the UPDATE path owing to the fact that there were some occurrences of FirstLowInvalidHeapAttributeNumber were not replaced by the YB equivalent in the partitions code path. Fixed these bugs and added more tests in the update code path in this commit.
  • Loading branch information
deeps1991 committed Aug 2, 2020
1 parent b4bf6d1 commit 955a872
Show file tree
Hide file tree
Showing 32 changed files with 9,665 additions and 427 deletions.
@@ -0,0 +1,33 @@
// Copyright (c) YugaByte, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
// in compliance with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations
// under the License.
//
package org.yb.pgsql;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.yb.util.YBTestRunnerNonTsanOnly;

/**
* Runs the pg_regress test suite on YB code.
*/
@RunWith(value=YBTestRunnerNonTsanOnly.class)
public class TestPgRegressPartitions extends BasePgSQLTest {
@Override
public int getTestMethodTimeoutSec() {
return 1800;
}

@Test
public void testPgRegressPartitions() throws Exception {
runPgRegressTest("yb_pg_partitions");
}
}
12 changes: 8 additions & 4 deletions src/postgres/src/backend/catalog/partition.c
Expand Up @@ -28,13 +28,13 @@
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "partitioning/partbounds.h"
#include "pg_yb_utils.h"
#include "rewrite/rewriteManip.h"
#include "utils/fmgroids.h"
#include "utils/partcache.h"
#include "utils/rel.h"
#include "utils/syscache.h"


static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
List **ancestors);
Expand Down Expand Up @@ -225,8 +225,8 @@ has_partition_attrs(Relation rel, Bitmapset *attnums, bool *used_in_expr)

if (partattno != 0)
{
if (bms_is_member(partattno - FirstLowInvalidHeapAttributeNumber,
attnums))
if (bms_is_member(partattno - YBGetFirstLowInvalidAttributeNumber(rel),
attnums))
{
if (used_in_expr)
*used_in_expr = false;
Expand All @@ -240,7 +240,11 @@ has_partition_attrs(Relation rel, Bitmapset *attnums, bool *used_in_expr)
Bitmapset *expr_attrs = NULL;

/* Find all attributes referenced */
pull_varattnos(expr, 1, &expr_attrs);
/* pull_varattnos_min_attr offsets attributes by min_attr - 1. Therefore pass
* YBGetFirstLowInvalidAttributeNumber + 1.
*/
pull_varattnos_min_attr(expr, 1, &expr_attrs,
YBGetFirstLowInvalidAttributeNumber(rel) + 1);
partexprs_item = lnext(partexprs_item);

if (bms_overlap(attnums, expr_attrs))
Expand Down
7 changes: 2 additions & 5 deletions src/postgres/src/backend/commands/tablecmds.c
Expand Up @@ -842,9 +842,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
InvalidOid,
typaddress);

/* Store inheritance information for new rel. */
StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL);

/*
* We must bump the command counter to make the newly-created relation
* tuple visible for opening.
Expand Down Expand Up @@ -6940,7 +6937,7 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,

/* Don't drop columns used in the partition key */
if (has_partition_attrs(rel,
bms_make_singleton(attnum - FirstLowInvalidHeapAttributeNumber),
bms_make_singleton(attnum - YBGetFirstLowInvalidAttributeNumber(rel)),
&is_expr))
{
if (!is_expr)
Expand Down Expand Up @@ -9667,7 +9664,7 @@ ATPrepAlterColumnType(List **wqueue,

/* Don't alter columns used in the partition key */
if (has_partition_attrs(rel,
bms_make_singleton(attnum - FirstLowInvalidHeapAttributeNumber),
bms_make_singleton(attnum - YBGetFirstLowInvalidAttributeNumber(rel)),
&is_expr))
{
if (!is_expr)
Expand Down
4 changes: 3 additions & 1 deletion src/postgres/src/backend/commands/ybccmds.c
Expand Up @@ -408,7 +408,7 @@ void
YBCCreateTable(CreateStmt *stmt, char relkind, TupleDesc desc,
Oid relationId, Oid namespaceId, Oid tablegroupId)
{
if (relkind != RELKIND_RELATION)
if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
{
return;
}
Expand Down Expand Up @@ -894,6 +894,8 @@ YBCPrepareAlterTable(AlterTableStmt *stmt, Relation rel, Oid relationId)
case AT_DisableRowSecurity:
case AT_ForceRowSecurity:
case AT_NoForceRowSecurity:
case AT_AttachPartition:
case AT_DetachPartition:
/* For these cases a YugaByte alter isn't required, so we do nothing. */
break;

Expand Down
20 changes: 9 additions & 11 deletions src/postgres/src/backend/executor/nodeModifyTable.c
Expand Up @@ -431,17 +431,11 @@ ExecInsert(ModifyTableState *mtstate,
* one; except that if we got here via tuple-routing, we don't need to
* if there's no BR trigger defined on the partition.
*/
if (!IsYBRelation(resultRelationDesc))
{
/*
* TODO(Hector) When partitioning is supported in YugaByte, this check must be enabled.
*/
if (resultRelInfo->ri_PartitionCheck &&
(resultRelInfo->ri_PartitionRoot == NULL ||
(resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_before_row)))
ExecPartitionCheck(resultRelInfo, slot, estate, true);
}
if (resultRelInfo->ri_PartitionCheck &&
(resultRelInfo->ri_PartitionRoot == NULL ||
(resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_before_row)))
ExecPartitionCheck(resultRelInfo, slot, estate, true);

if (onconflict != ONCONFLICT_NONE && resultRelInfo->ri_NumIndices > 0)
{
Expand Down Expand Up @@ -1133,6 +1127,10 @@ ExecUpdate(ModifyTableState *mtstate,
if (resultRelationDesc->rd_att->constr)
ExecConstraints(resultRelInfo, slot, estate, mtstate);

/*
* Verify that the update does not violate partition constraints.
*/
ExecPartitionCheck(resultRelInfo, slot, estate, true /* emitError */);

RangeTblEntry *rte = rt_fetch(resultRelInfo->ri_RangeTableIndex,
estate->es_range_table);
Expand Down
17 changes: 11 additions & 6 deletions src/postgres/src/backend/optimizer/path/pathkeys.c
Expand Up @@ -185,7 +185,12 @@ make_pathkey_from_sortinfo(PlannerInfo *root,
List *opfamilies;
EquivalenceClass *eclass;

strategy = reverse_sort ? BTGreaterStrategyNumber : BTLessStrategyNumber;
if (is_hash_index) {
/* We are picking a hash index. The strategy can only be BTEqualStrategyNumber */
strategy = BTEqualStrategyNumber;
} else {
strategy = reverse_sort ? BTGreaterStrategyNumber : BTLessStrategyNumber;
}

/*
* EquivalenceClasses need to contain opfamily lists based on the family
Expand Down Expand Up @@ -215,11 +220,11 @@ make_pathkey_from_sortinfo(PlannerInfo *root,
return NULL;

/* This "eclass" is either a "=" or "sort" operator, and for hash_columns, we allow equality
* condition but not ASC or DESC sorting.
*/
if (is_hash_index && eclass->ec_sortref != 0) {
return NULL;
}
* condition but not ASC or DESC sorting.
*/
if (is_hash_index && eclass->ec_sortref != 0) {
return NULL;
}

/* And finally we can find or create a PathKey node */
return make_canonical_pathkey(root, eclass, opfamily,
Expand Down
47 changes: 25 additions & 22 deletions src/postgres/src/backend/optimizer/prep/prepunion.c
Expand Up @@ -48,6 +48,7 @@
#include "optimizer/tlist.h"
#include "parser/parse_coerce.h"
#include "parser/parsetree.h"
#include "pg_yb_utils.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/selfuncs.h"
Expand Down Expand Up @@ -115,8 +116,9 @@ static void make_inh_translation_list(Relation oldrelation,
Relation newrelation,
Index newvarno,
List **translated_vars);
static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
List *translated_vars);
static Bitmapset *translate_col_privs(Relation parentrel,
const Bitmapset *parent_privs,
List *translated_vars);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
static Relids adjust_child_relids(Relids relids, int nappinfos,
Expand Down Expand Up @@ -1840,12 +1842,15 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
*/
if (childOID != parentOID)
{
childrte->selectedCols = translate_col_privs(parentrte->selectedCols,
appinfo->translated_vars);
childrte->insertedCols = translate_col_privs(parentrte->insertedCols,
appinfo->translated_vars);
childrte->updatedCols = translate_col_privs(parentrte->updatedCols,
appinfo->translated_vars);
childrte->selectedCols = translate_col_privs(parentrel,
parentrte->selectedCols,
appinfo->translated_vars);
childrte->insertedCols = translate_col_privs(parentrel,
parentrte->insertedCols,
appinfo->translated_vars);
childrte->updatedCols = translate_col_privs(parentrel,
parentrte->updatedCols,
appinfo->translated_vars);
}
}

Expand Down Expand Up @@ -1963,7 +1968,6 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
elog(ERROR, "could not find inherited attribute \"%s\" of relation \"%s\"",
attname, RelationGetRelationName(newrelation));
}

/* Found it, check type and collation match */
if (atttypid != att->atttypid || atttypmod != att->atttypmod)
elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's type",
Expand Down Expand Up @@ -1995,30 +1999,27 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
* we set the per-column bits for all inherited columns.
*/
static Bitmapset *
translate_col_privs(const Bitmapset *parent_privs,
List *translated_vars)
translate_col_privs(Relation parentrel,
const Bitmapset *parent_privs,
List *translated_vars)
{
Bitmapset *child_privs = NULL;
bool whole_row;
int attno;
ListCell *lc;

/*
* TODO check that these offsets (i.e. FirstLowInvalidHeapAttributeNumber) work
* properly for YugaByte tables after #1129 (specifically INHERITS).
*/
const int firstLowInvalidAttrNumber = YBGetFirstLowInvalidAttributeNumber(parentrel);

/* System attributes have the same numbers in all tables */
for (attno = FirstLowInvalidHeapAttributeNumber + 1; attno < 0; attno++)
for (attno = firstLowInvalidAttrNumber + 1; attno < 0; attno++)
{
if (bms_is_member(attno - FirstLowInvalidHeapAttributeNumber,
if (bms_is_member(attno - firstLowInvalidAttrNumber,
parent_privs))
child_privs = bms_add_member(child_privs,
attno - FirstLowInvalidHeapAttributeNumber);
attno - firstLowInvalidAttrNumber);
}

/* Check if parent has whole-row reference */
whole_row = bms_is_member(InvalidAttrNumber - FirstLowInvalidHeapAttributeNumber,
whole_row = bms_is_member(InvalidAttrNumber - firstLowInvalidAttrNumber,
parent_privs);

/* And now translate the regular user attributes, using the vars list */
Expand All @@ -2031,10 +2032,12 @@ translate_col_privs(const Bitmapset *parent_privs,
if (var == NULL) /* ignore dropped columns */
continue;
if (whole_row ||
bms_is_member(attno - FirstLowInvalidHeapAttributeNumber,
bms_is_member(attno - firstLowInvalidAttrNumber,
parent_privs))
{
child_privs = bms_add_member(child_privs,
var->varattno - FirstLowInvalidHeapAttributeNumber);
var->varattno - firstLowInvalidAttrNumber);
}
}

return child_privs;
Expand Down
4 changes: 2 additions & 2 deletions src/postgres/src/backend/optimizer/util/var.c
Expand Up @@ -255,7 +255,7 @@ pull_varattnos_walker(Node *node, pull_varattnos_context *context)

/*
* The same as pull_varattnos() but attribute numbers are offset by
* (rel->min_attr + 1) instead of FirstLowInvalidHeapAttributeNumber.
* (rel->min_attr - 1) instead of FirstLowInvalidHeapAttributeNumber.
*/
void
pull_varattnos_min_attr(Node *node, Index varno, Bitmapset **varattnos, AttrNumber min_attr)
Expand All @@ -272,7 +272,7 @@ pull_varattnos_min_attr(Node *node, Index varno, Bitmapset **varattnos, AttrNumb

/*
* The same as pull_varattnos_walker() but attribute numbers are offset by
* (rel->min_attr + 1) instead of FirstLowInvalidHeapAttributeNumber.
* (rel->min_attr - 1) instead of FirstLowInvalidHeapAttributeNumber.
*/
static bool
pull_varattnos_walker_min_attr(Node *node, pull_varattnos_context *context, AttrNumber min_attr)
Expand Down
7 changes: 0 additions & 7 deletions src/postgres/src/backend/parser/gram.y
Expand Up @@ -1921,7 +1921,6 @@ AlterTableStmt:
}
| ALTER TABLE relation_expr partition_cmd
{
parser_ybc_signal_unsupported(@1, "ALTER TABLE WITH PARTITION COMMAND", 1124);
AlterTableStmt *n = makeNode(AlterTableStmt);
n->relation = $3;
n->cmds = list_make1($4);
Expand All @@ -1931,7 +1930,6 @@ AlterTableStmt:
}
| ALTER TABLE IF_P EXISTS relation_expr partition_cmd
{
parser_ybc_signal_unsupported(@1, "ALTER TABLE WITH PARTITION COMMAND", 1124);
AlterTableStmt *n = makeNode(AlterTableStmt);
n->relation = $5;
n->cmds = list_make1($6);
Expand Down Expand Up @@ -2112,7 +2110,6 @@ partition_cmd:
/* ALTER TABLE <name> ATTACH PARTITION <table_name> FOR VALUES */
ATTACH PARTITION qualified_name PartitionBoundSpec
{
parser_ybc_signal_unsupported(@1, "ALTER TABLE ATTACH PARTITION", 1124);
AlterTableCmd *n = makeNode(AlterTableCmd);
PartitionCmd *cmd = makeNode(PartitionCmd);

Expand All @@ -2126,7 +2123,6 @@ partition_cmd:
/* ALTER TABLE <name> DETACH PARTITION <partition_name> */
| DETACH PARTITION qualified_name
{
parser_ybc_signal_unsupported(@1, "ALTER TABLE DETACH PARTITION", 1124);
AlterTableCmd *n = makeNode(AlterTableCmd);
PartitionCmd *cmd = makeNode(PartitionCmd);

Expand Down Expand Up @@ -2889,7 +2885,6 @@ PartitionBoundSpec:
/* a LIST partition */
| FOR VALUES IN_P '(' partbound_datum_list ')'
{
parser_ybc_signal_unsupported(@1, "LIST PARTITION", 1126);
PartitionBoundSpec *n = makeNode(PartitionBoundSpec);

n->strategy = PARTITION_STRATEGY_LIST;
Expand All @@ -2903,7 +2898,6 @@ PartitionBoundSpec:
/* a RANGE partition */
| FOR VALUES FROM '(' range_datum_list ')' TO '(' range_datum_list ')'
{
parser_ybc_signal_unsupported(@1, "RANGE PARTITION", 1126);
PartitionBoundSpec *n = makeNode(PartitionBoundSpec);

n->strategy = PARTITION_STRATEGY_RANGE;
Expand Down Expand Up @@ -4212,7 +4206,6 @@ OptPartitionSpec: PartitionSpec { $$ = $1; }

PartitionSpec: PARTITION BY part_strategy '(' part_params ')'
{
parser_ybc_signal_unsupported(@1, "PARTITION BY", 1126);
PartitionSpec *n = makeNode(PartitionSpec);

n->strategy = $3;
Expand Down

0 comments on commit 955a872

Please sign in to comment.