Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Pack inserted value #20713

Closed
1 task done
spolitov opened this issue Jan 20, 2024 · 1 comment
Closed
1 task done

[DocDB] Pack inserted value #20713

spolitov opened this issue Jan 20, 2024 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@spolitov
Copy link
Contributor

spolitov commented Jan 20, 2024

Jira Link: DB-9716

Description

When inserting row with multiple values we send every column as a separate protobuf with complex structure and many other duplicated fields.
As result, during bulk load we spent significant amount of time parsing and analysing those protobufs.
Instead of that we could pack all values in postgres layer, and than insert them directly to docdb.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@spolitov spolitov added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jan 20, 2024
@spolitov spolitov self-assigned this Jan 20, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jan 20, 2024
@spolitov
Copy link
Contributor Author

Note, inserting 2 rows with 10 column values:

pgsql_write_batch {
  client: YQL_CLIENT_PGSQL
  stmt_id: 4560228640
  stmt_type: PGSQL_INSERT
  table_id: "00004000000030008000000000004000"
  schema_version: 0
  ybctid_column_value {
    value {
      binary_value: "H\200\001\202\271!"
    }
  }
  column_values {
    column_id: 11
    expr {
      value {
        int32_value: 47279295
      }
    }
  }
  column_values {
    column_id: 12
    expr {
      value {
        int32_value: 46652250
      }
    }
  }
  column_values {
    column_id: 13
    expr {
      value {
        int32_value: 79381197
      }
    }
  }
  column_values {
    column_id: 14
    expr {
      value {
        int32_value: 35803789
      }
    }
  }
  column_values {
    column_id: 15
    expr {
      value {
        int32_value: 92005163
      }
    }
  }
  column_values {
    column_id: 16
    expr {
      value {
        int32_value: 46730059
      }
    }
  }
  column_values {
    column_id: 17
    expr {
      value {
        int32_value: 30716606
      }
    }
  }
  column_values {
    column_id: 18
    expr {
      value {
        int32_value: 35252161
      }
    }
  }
  column_values {
    column_id: 19
    expr {
      value {
        int32_value: 93603689
      }
    }
  }
  column_values {
    column_id: 20
    expr {
      value {
        int32_value: 74867912
      }
    }
  }
  column_refs {
  }
  ysql_catalog_version: 1
  partition_key: "H\200\001\202\271!"
}
pgsql_write_batch {
  client: YQL_CLIENT_PGSQL
  stmt_id: 4635500832
  stmt_type: PGSQL_INSERT
  table_id: "00004000000030008000000000004000"
  schema_version: 0
  ybctid_column_value {
    value {
      binary_value: "H\200\001\202\272!"
    }
  }
  column_values {
    column_id: 11
    expr {
      value {
        int32_value: 10244035
      }
    }
  }
  column_values {
    column_id: 12
    expr {
      value {
        int32_value: 41177491
      }
    }
  }
  column_values {
    column_id: 13
    expr {
      value {
        int32_value: 10216579
      }
    }
  }
  column_values {
    column_id: 14
    expr {
      value {
        int32_value: 74231766
      }
    }
  }
  column_values {
    column_id: 15
    expr {
      value {
        int32_value: 65619415
      }
    }
  }
  column_values {
    column_id: 16
    expr {
      value {
        int32_value: 53048433
      }
    }
  }
  column_values {
    column_id: 17
    expr {
      value {
        int32_value: 81499950
      }
    }
  }
  column_values {
    column_id: 18
    expr {
      value {
        int32_value: 76332784
      }
    }
  }
  column_values {
    column_id: 19
    expr {
      value {
        int32_value: 58926211
      }
    }
  }
  column_values {
    column_id: 20
    expr {
      value {
        int32_value: 8846646
      }
    }
  }
  column_refs {
  }
  ysql_catalog_version: 1
  partition_key: "H\200\001\202\272!"
}

spolitov added a commit that referenced this issue Jan 23, 2024
Summary:
When inserting row with multiple values we send every column as a separate protobuf with complex structure and many other duplicated fields.
As result, during bulk load we spent significant amount of time parsing and analysing those protobufs.
Instead of that we could pack all values in postgres layer, and than insert them directly to docdb.

Controlled by newly added preview flag - ysql_pack_inserted_value.

Currently row is packed using v1 encoding. Because v2 not yet in release state.

Performance comparison using PgSingleTServerTest.ScanWithPackedRow insert time:
master (fac37c6) - 30.08s
this diff - 25.30s

Bulk load comparison 30M rows using the following script:
```
drop table if exists test_table;
create extension if not exists pgcrypto;
create table test_table(k INT, v1 INT, v2 INT, v3 INT, v4 INT, v5 INT, PRIMARY KEY(k ASC));

do $$
begin
  for counter in 1..30000 loop
    if counter % 1000 = 0 then
      raise notice 'counter: %', (counter * 1000);
    end if;
    insert into test_table (select i + counter*1000, random()*1000000000, random()*1000000000, random()*1000000000, random()*1000000000, random()*1000000000
                        from generate_series(1, 1000) i);
    commit;
  end loop;
end $$;
```

master - 14m51.963s
this diff - 12m19.418s
Jira: DB-9716

Test Plan: PgPackedInsertTest

Reviewers: tnayak, mbautin

Reviewed By: mbautin

Subscribers: yql, mbautin, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31602
@rthallamko3 rthallamko3 removed the status/awaiting-triage Issue awaiting triage label Feb 21, 2024
spolitov added a commit that referenced this issue Feb 22, 2024
Summary:
Currently, while inserting multiple rows into a single table we generate independent operations.
There is a lot of duplicated information in those operations.
So we have to repeat the same steps while processing them, such as resolving table, checking schema version etc.
And also we have to allocate this repeated data on the sender side.

This diff changes protocol to use a single write operation in such case.

The feature is not completely ready, because it works only for inserts to the same tablet.
Should be addressed in follow-up diffs.
To enable/disable feature recently added preview gflag could be used: ysql_pack_inserted_value.

Performance comparison using PgSingleTServerTest.ScanWithPackedRow:
```
this diff
Insert full time - 17.66s
Insert TServer time - 11.37s

master (d3fca95, packed inserted disabled):
Insert full time - 32.44s
Insert TServer time - 15.00s

master (d3fca95, packed inserted enabled):
Insert full time - 26.49s
Insert TServer time - 13.38s
```
Jira: DB-9716

Test Plan: Jenkins

Reviewers: mbautin, tnayak

Reviewed By: mbautin, tnayak

Subscribers: yql, bogdan, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31930
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants