Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] [migrate] changes to import and export sections using the COPY command. #13164

Merged
merged 10 commits into from
Jul 25, 2022
99 changes: 9 additions & 90 deletions docs/content/preview/migrate/manual-import/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
title: Manual import
headerTitle: Manual import
linkTitle: Manual import
description: Migrate PostgreSQL data to YugabyteDB.
description: Manual PostgreSQL import to YugabyteDB.
image: /images/section_icons/develop/learn.png
headcontent: Migrate PostgreSQL data to YugabyteDB using ysql_dump.
headcontent: Manual PostgreSQL import to YugabyteDB.
aliases:
- /preview/migrate/migrate-from-postgresql/
menu:
Expand All @@ -17,94 +17,13 @@ type: indexpage

The steps below cover how to manually migrate PostgreSQL data and applications to YugabyteDB.

- [Convert a PostgreSQL schema](migrate-schema/)
- [Migrate a PostgreSQL application](migrate-application/)
- [Export PostgreSQL data](export-data/)
- [Prepare a cluster](prepare-cluster/)
- [Import PostgreSQL data](import-data/)
- [Verify a migration](verify-migration/)

{{< tip title="Migrate using YugabyteDB Voyager" >}}
aishwarya24 marked this conversation as resolved.
Show resolved Hide resolved
To automate your migration from PostgreSQL to YugabyteDB, use [YugabyteDB Voyager](../yb-voyager/).
{{< /tip >}}

<div class="row">

<div class="col-12 col-md-6 col-lg-12 col-xl-6">
<a class="section-link icon-offset" href="migrate-schema/">
<div class="head">
<div class="icon">
<i class="icon-database-alt2"></i>
</div>
<div class="title">Migrate a DDL schema</div>
</div>
<div class="body">
Migrate your DDL schema from PostgreSQL to YugabyteDB.
</div>
</a>
</div>

<div class="col-12 col-md-6 col-lg-12 col-xl-6">
<a class="section-link icon-offset" href="migrate-application/">
<div class="head">
<div class="icon">
<i class="icon-database-alt2"></i>
</div>
<div class="title">Migrate a PostgreSQL application</div>
</div>
<div class="body">
Migrate a PostgreSQL application to YugabyteDB.
</div>
</a>
</div>

<div class="col-12 col-md-6 col-lg-12 col-xl-6">
<a class="section-link icon-offset" href="export-data/">
<div class="head">
<div class="icon">
<i class="icon-database-alt2"></i>
</div>
<div class="title">Export PostgreSQL data</div>
</div>
<div class="body">
Export data from PostgreSQL for importing into YugabyteDB.
</div>
</a>
</div>

<div class="col-12 col-md-6 col-lg-12 col-xl-6">
<a class="section-link icon-offset" href="prepare-cluster/">
<div class="head">
<div class="icon">
<i class="icon-database-alt2"></i>
</div>
<div class="title">Prepare a cluster</div>
</div>
<div class="body">
Prepare your YugabyteDB cluster for data import.
</div>
</a>
</div>

<div class="col-12 col-md-6 col-lg-12 col-xl-6">
<a class="section-link icon-offset" href="import-data/">
<div class="head">
<div class="icon">
<i class="icon-database-alt2"></i>
</div>
<div class="title">Import PostgreSQL data</div>
</div>
<div class="body">
Import PostgreSQL data into a YugabyteDB cluster.
</div>
</a>
</div>

<div class="col-12 col-md-6 col-lg-12 col-xl-6">
<a class="section-link icon-offset" href="verify-migration/">
<div class="head">
<div class="icon">
<i class="icon-database-alt2"></i>
</div>
<div class="title">Verify the migration</div>
</div>
<div class="body">
Verify the migration to YugabyteDB was successful.
</div>
</a>
</div>

</div>
86 changes: 57 additions & 29 deletions docs/content/preview/migrate/manual-import/export-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,57 +13,85 @@ menu:
type: docs
---

The recommended way to export data from PostgreSQL for purposes of importing it to YugabyteDB is using the CSV format.

## Exporting an entire database
The recommended way to export data from PostgreSQL for purposes of importing it to YugabyteDB is via CSV files using the COPY command.
However, for exporting an entire database that consists of smaller datasets, you can use the YugabyteDB [`ysql_dump`](../../../admin/ysql-dump/) utility.

The recommended way to dump an entire database from PostgreSQL is to use the YugabyteDB [`ysql_dump`](../../../admin/ysql-dump/) backup utility, which is in turn derived from PostgreSQL pg_dump.
{{< tip title="Migrate using YugabyteDB Voyager" >}}
To automate your migration from PostgreSQL to YugabyteDB, use [YugabyteDB Voyager](../../yb-voyager/). To learn more, refer to the [export schema](../../yb-voyager/migrate-steps/#export-and-analyze-schema) and [export data](../../yb-voyager/migrate-steps/#export-data) steps.
{{< /tip >}}

```sh
$ ysql_dump -d mydatabase > mydatabase-dump.sql
## Export data into CSV files using the COPY command

To export the data, connect to the source PostgreSQL database using the psql tool, and execute the COPY TO command as follows:

```sql
COPY <table_name>
TO '<table_name>.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

{{< note title="Note" >}}
The `ysql_dump` approach has been tested on PostgreSQL v11.2, and may not work on very new versions of PostgreSQL. To export an entire database in these cases, use the pg_dump tool, which is documented in detail in the [PostgreSQL documentation on pg_dump](https://www.postgresql.org/docs/12/app-pgdump.html).
{{< /note >}}

## Export using COPY

This is an alternative to using ysql_dump in order to export a single table from the source PostgreSQL database into CSV files. This tool allows extracting a subset of rows and/or columns from a table. This can be achieved by connecting to the source DB using psql and using the `COPY TO` command, as shown below.
The COPY TO command exports a single table, so you should execute it for every table that you want to export.

```sql
COPY mytable TO 'export-1.csv' DELIMITER ',' CSV HEADER;
```
{{< /note >}}

To extract a subset of rows from a table, it is possible to output the result of an SQL command.
It is also possible to export a subset of rows based on a condition:

```sql
COPY (
SELECT * FROM mytable
WHERE <where condition>
) TO 'export-1.csv' DELIMITER ',' CSV HEADER;
SELECT * FROM <table_name>
WHERE <condition>
)
TO '<table_name>.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

The various options here are described in detail in the [PostgreSQL documentation for the COPY command](https://www.postgresql.org/docs/12/sql-copy.html).
For all available options provided by the COPY TO command, refer to the [PostgreSQL documentation](https://www.postgresql.org/docs/current/sql-copy.html).

## Run large table exports in parallel
### Parallelize large table export

Exporting large data sets from PostgreSQL can be made efficient by running multiple COPY processes in parallel for a subset of data. This will result in multiple csv files being produced, which can subsequently be imported in parallel.

An example of running multiple exports in parallel is shown below. Remember to use a suitable value for *num_rows_per_export*, for example 1 million rows.
For large tables, it might be beneficial to parallelize the process by exporting data in chunks as follows:

```sql
COPY (
SELECT * FROM mytable
ORDER BY primary_key_col
SELECT * FROM <table_name>
ORDER BY <primary_key_col>
LIMIT num_rows_per_export OFFSET 0
) TO 'export-1.csv' DELIMITER ',' CSV HEADER;
)
TO '<table_name>_1.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

```sql
COPY (
SELECT * FROM mytable
ORDER BY primary_key_col
SELECT * FROM <table_name>
ORDER BY <primary_key_col>
LIMIT num_rows_per_export OFFSET num_rows_per_export
) TO 'export-2.csv' WITH CSV;
)
TO '<table_name>_2.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

...
```sql
COPY (
SELECT * FROM <table_name>
ORDER BY <primary_key_col>
LIMIT num_rows_per_export OFFSET num_rows_per_export * 2
)
TO '<table_name>_3.csv'
WITH (FORMAT CSV DELIMITER ',' HEADER);
```

You can run the above commands in parallel to speed up the process. This approach will also produce multiple CSV files, allowing for parallel import on the YugabyteDB side.

## Export data into SQL script using ysql_dump

An alternative way to export the data is using the YugabyteDB [`ysql_dump`](../../../admin/ysql-dump/) backup utility, which is derived from PostgreSQL pg_dump.

```sh
$ ysql_dump -d <database_name> > <database_name>.sql
```

`ysql_dump` is the ideal option for smaller datasets, because it allows you to export a whole database by running a single command. However, the COPY command is recommended for large databases, because it significantly enhances the performance.
101 changes: 87 additions & 14 deletions docs/content/preview/migrate/manual-import/import-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,28 +13,101 @@ menu:
type: docs
---

The next step is to import the PostgreSQL data into YugabyteDB.
{{< tip title="Migrate using YugabyteDB Voyager" >}}
To automate your migration from PostgreSQL to YugabyteDB, use [YugabyteDB Voyager](../../yb-voyager/). To learn more, refer to the [import schema](../../yb-voyager/migrate-steps/#import-schema) and [import data](../../yb-voyager/migrate-steps/#import-data) steps.
{{< /tip >}}

{{< note title="Note" >}}
After the data import step, remember to recreate any constraints and triggers that might have been disabled to speed up loading the data. This would ensure that the database will perform relational integrity checking for data going forward.
{{< /note >}}
## Import data from CSV files

To import data that was previously exported into CSV files, use the COPY FROM command as follows:

```sql
COPY <table_name>
FROM '<table_name>.csv'
WITH (FORMAT CSV DELIMITER ',', HEADER, ROWS_PER_TRANSACTION 1000, DISABLE_FK_CHECK);
```

## Import a database
In the command above, the `ROWS_PER_TRANSACTION` parameter splits the load into smaller transactions (1000 rows each in this example), instead of running a single transaction spawning across all the data in the file. Additionally, the `DISABLE_FK_CHECK` parameter skips the foreign key checks for the duration of the import process.

To import an entire database from a `pg_dump` or `ysql_dump` export, use `ysqlsh`. The command should look as shown below.
Both `ROWS_PER_TRANSACTION` and `DISABLE_FK_CHECK` parameters are recommended for the initial import of the data, especially for large tables, because they significantly reduce the total time required to import the data. You can import multiple files in a single COPY command to further speed up the process. Following is a sample example:

```sh
$ ysqlsh -f <db-sql-script>
```sql
aishwarya24 marked this conversation as resolved.
Show resolved Hide resolved
yugabyte=# \! ls t*.txt
t1.txt t2.txt t3.txt
```

{{< tip title="Tip" >}}
The `ysqlsh` tool is a derivative of the PostgreSQL tool, `psql`. All `psql` commands would work in `ysqlsh`.
{{< /tip >}}
```output
yugabyte=# \! cat t*.txt
1,2,3
4,5,6
7,8,9
```

```sql
yugabyte=# \d t
```

```output
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1 | integer | | |
c2 | integer | | |
c3 | integer | | |
```

```sql
yugabyte=# SELECT * FROM t;
```

```output
c1 | c2 | c3
----+----+----
(0 rows)
```

## Import a table using COPY FROM
```sql
yugabyte=# COPY t FROM PROGRAM 'cat /home/yugabyte/t*.txt' WITH (FORMAT CSV, DELIMITER ',', ROWS_PER_TRANSACTION 1000, DISABLE_FK_CHECK);
COPY 3
```

```sql
yugabyte=# SELECT * FROM t;
```

```output
c1 | c2 | c3
----+----+----
7 | 8 | 9
4 | 5 | 6
1 | 2 | 3
(3 rows)
```

For detailed information on the COPY FROM command, refer to the [COPY](../../../api/ysql/the-sql-language/statements/cmd_copy/) statement reference.

### Error handling

If the COPY FROM command fails during the process, you should try rerunning it. However, you don’t have to rerun the entire file. COPY FROM imports data into rows individually, starting from the top of the file. So if you know that some of the rows have been successfully imported prior to the failure, you can safely ignore those rows by adding the SKIP parameter.

Importing a single table (or a partial export from a table) can be done by running the COPY FROM command, and providing it the location of the export file prepared in a previous step. This should look as shown below.
For example, to skip the first 5000 rows in a file, run the command as follows:

```sql
COPY country FROM 'export.csv' DELIMITER ',' CSV HEADER;
COPY <table_name>
FROM '<table_name>.csv'
WITH (FORMAT CSV DELIMITER ',', HEADER, ROWS_PER_TRANSACTION 1000, DISABLE_FK_CHECK, SKIP 5000);
```

## Import data from SQL script

To import an entire database from a `pg_dump` or `ysql_dump` export, use `ysqlsh` as follows:

```sql
ysqlsh -f <database_name>.sql
```

{{< note title="Note" >}}

After the data import step, remember to recreate any constraints and triggers that might have been disabled to speed up loading the data. This ensures that the database will perform relational integrity checking for data going forward.

{{< /note >}}